diff options
Diffstat (limited to 'Documentation/powerpc')
20 files changed, 1189 insertions, 3786 deletions
diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX index 3be84aa38df..6db73df0427 100644 --- a/Documentation/powerpc/00-INDEX +++ b/Documentation/powerpc/00-INDEX @@ -5,29 +5,28 @@ please mail me. 00-INDEX - this file -booting-without-of.txt - - Booting the Linux/ppc kernel without Open Firmware +bootwrapper.txt + - Information on how the powerpc kernel is wrapped for boot on various + different platforms. cpu_features.txt - info on how we support a variety of CPUs with minimal compile-time options. eeh-pci-error-recovery.txt - info on PCI Bus EEH Error Recovery +firmware-assisted-dump.txt + - Documentation on the firmware assisted dump mechanism "fadump". hvcs.txt - IBM "Hypervisor Virtual Console Server" Installation Guide +kvm_440.txt + - Various notes on the implementation of KVM for PowerPC 440. mpc52xx.txt - Linux 2.6.x on MPC52xx family -mpc52xx-device-tree-bindings.txt - - MPC5200 Device Tree Bindings -ppc_htab.txt - - info about the Linux/PPC /proc/ppc_htab entry -SBC8260_memory_mapping.txt - - EST SBC8260 board info -smp.txt - - use and state info about Linux/PPC on MP machines -sound.txt - - info on sound support under Linux/PPC -zImage_layout.txt - - info on the kernel images for Linux/PPC +pmu-ebb.txt + - Description of the API for using the PMU with Event Based Branches. qe_firmware.txt - describes the layout of firmware binaries for the Freescale QUICC Engine and the code that parses and uploads the microcode therein. +ptrace.txt + - Information on the ptrace interfaces for hardware debug registers. +transactional_memory.txt + - Overview of the Power8 transactional memory support. diff --git a/Documentation/powerpc/SBC8260_memory_mapping.txt b/Documentation/powerpc/SBC8260_memory_mapping.txt deleted file mode 100644 index e6e9ee0506c..00000000000 --- a/Documentation/powerpc/SBC8260_memory_mapping.txt +++ /dev/null @@ -1,197 +0,0 @@ -Please mail me (Jon Diekema, diekema_jon@si.com or diekema@cideas.com) -if you have questions, comments or corrections. - - * EST SBC8260 Linux memory mapping rules - - http://www.estc.com/ - http://www.estc.com/products/boards/SBC8260-8240_ds.html - - Initial conditions: - ------------------- - - Tasks that need to be perform by the boot ROM before control is - transferred to zImage (compressed Linux kernel): - - - Define the IMMR to 0xf0000000 - - - Initialize the memory controller so that RAM is available at - physical address 0x00000000. On the SBC8260 is this 16M (64M) - SDRAM. - - - The boot ROM should only clear the RAM that it is using. - - The reason for doing this is to enhances the chances of a - successful post mortem on a Linux panic. One of the first - items to examine is the 16k (LOG_BUF_LEN) circular console - buffer called log_buf which is defined in kernel/printk.c. - - - To enhance boot ROM performance, the I-cache can be enabled. - - Date: Mon, 22 May 2000 14:21:10 -0700 - From: Neil Russell <caret@c-side.com> - - LiMon (LInux MONitor) runs with and starts Linux with MMU - off, I-cache enabled, D-cache disabled. The I-cache doesn't - need hints from the MMU to work correctly as the D-cache - does. No D-cache means no special code to handle devices in - the presence of cache (no snooping, etc). The use of the - I-cache means that the monitor can run acceptably fast - directly from ROM, rather than having to copy it to RAM. - - - Build the board information structure (see - include/asm-ppc/est8260.h for its definition) - - - The compressed Linux kernel (zImage) contains a bootstrap loader - that is position independent; you can load it into any RAM, - ROM or FLASH memory address >= 0x00500000 (above 5 MB), or - at its link address of 0x00400000 (4 MB). - - Note: If zImage is loaded at its link address of 0x00400000 (4 MB), - then zImage will skip the step of moving itself to - its link address. - - - Load R3 with the address of the board information structure - - - Transfer control to zImage - - - The Linux console port is SMC1, and the baud rate is controlled - from the bi_baudrate field of the board information structure. - On thing to keep in mind when picking the baud rate, is that - there is no flow control on the SMC ports. I would stick - with something safe and standard like 19200. - - On the EST SBC8260, the SMC1 port is on the COM1 connector of - the board. - - - EST SBC8260 defaults: - --------------------- - - Chip - Memory Sel Bus Use - --------------------- --- --- ---------------------------------- - 0x00000000-0x03FFFFFF CS2 60x (16M or 64M)/64M SDRAM - 0x04000000-0x04FFFFFF CS4 local 4M/16M SDRAM (soldered to the board) - 0x21000000-0x21000000 CS7 60x 1B/64K Flash present detect (from the flash SIMM) - 0x21000001-0x21000001 CS7 60x 1B/64K Switches (read) and LEDs (write) - 0x22000000-0x2200FFFF CS5 60x 8K/64K EEPROM - 0xFC000000-0xFCFFFFFF CS6 60x 2M/16M flash (8 bits wide, soldered to the board) - 0xFE000000-0xFFFFFFFF CS0 60x 4M/16M flash (SIMM) - - Notes: - ------ - - - The chip selects can map 32K blocks and up (powers of 2) - - - The SDRAM machine can handled up to 128Mbytes per chip select - - - Linux uses the 60x bus memory (the SDRAM DIMM) for the - communications buffers. - - - BATs can map 128K-256Mbytes each. There are four data BATs and - four instruction BATs. Generally the data and instruction BATs - are mapped the same. - - - The IMMR must be set above the kernel virtual memory addresses, - which start at 0xC0000000. Otherwise, the kernel may crash as - soon as you start any threads or processes due to VM collisions - in the kernel or user process space. - - - Details from Dan Malek <dan_malek@mvista.com> on 10/29/1999: - - The user application virtual space consumes the first 2 Gbytes - (0x00000000 to 0x7FFFFFFF). The kernel virtual text starts at - 0xC0000000, with data following. There is a "protection hole" - between the end of kernel data and the start of the kernel - dynamically allocated space, but this space is still within - 0xCxxxxxxx. - - Obviously the kernel can't map any physical addresses 1:1 in - these ranges. - - - Details from Dan Malek <dan_malek@mvista.com> on 5/19/2000: - - During the early kernel initialization, the kernel virtual - memory allocator is not operational. Prior to this KVM - initialization, we choose to map virtual to physical addresses - 1:1. That is, the kernel virtual address exactly matches the - physical address on the bus. These mappings are typically done - in arch/ppc/kernel/head.S, or arch/ppc/mm/init.c. Only - absolutely necessary mappings should be done at this time, for - example board control registers or a serial uart. Normal device - driver initialization should map resources later when necessary. - - Although platform dependent, and certainly the case for embedded - 8xx, traditionally memory is mapped at physical address zero, - and I/O devices above physical address 0x80000000. The lowest - and highest (above 0xf0000000) I/O addresses are traditionally - used for devices or registers we need to map during kernel - initialization and prior to KVM operation. For this reason, - and since it followed prior PowerPC platform examples, I chose - to map the embedded 8xx kernel to the 0xc0000000 virtual address. - This way, we can enable the MMU to map the kernel for proper - operation, and still map a few windows before the KVM is operational. - - On some systems, you could possibly run the kernel at the - 0x80000000 or any other virtual address. It just depends upon - mapping that must be done prior to KVM operational. You can never - map devices or kernel spaces that overlap with the user virtual - space. This is why default IMMR mapping used by most BDM tools - won't work. They put the IMMR at something like 0x10000000 or - 0x02000000 for example. You simply can't map these addresses early - in the kernel, and continue proper system operation. - - The embedded 8xx/82xx kernel is mature enough that all you should - need to do is map the IMMR someplace at or above 0xf0000000 and it - should boot far enough to get serial console messages and KGDB - connected on any platform. There are lots of other subtle memory - management design features that you simply don't need to worry - about. If you are changing functions related to MMU initialization, - you are likely breaking things that are known to work and are - heading down a path of disaster and frustration. Your changes - should be to make the flexibility of the processor fit Linux, - not force arbitrary and non-workable memory mappings into Linux. - - - You don't want to change KERNELLOAD or KERNELBASE, otherwise the - virtual memory and MMU code will get confused. - - arch/ppc/Makefile:KERNELLOAD = 0xc0000000 - - include/asm-ppc/page.h:#define PAGE_OFFSET 0xc0000000 - include/asm-ppc/page.h:#define KERNELBASE PAGE_OFFSET - - - RAM is at physical address 0x00000000, and gets mapped to - virtual address 0xC0000000 for the kernel. - - - Physical addresses used by the Linux kernel: - -------------------------------------------- - - 0x00000000-0x3FFFFFFF 1GB reserved for RAM - 0xF0000000-0xF001FFFF 128K IMMR 64K used for dual port memory, - 64K for 8260 registers - - - Logical addresses used by the Linux kernel: - ------------------------------------------- - - 0xF0000000-0xFFFFFFFF 256M BAT0 (IMMR: dual port RAM, registers) - 0xE0000000-0xEFFFFFFF 256M BAT1 (I/O space for custom boards) - 0xC0000000-0xCFFFFFFF 256M BAT2 (RAM) - 0xD0000000-0xDFFFFFFF 256M BAT3 (if RAM > 256MByte) - - - EST SBC8260 Linux mapping: - -------------------------- - - DBAT0, IBAT0, cache inhibited: - - Chip - Memory Sel Use - --------------------- --- --------------------------------- - 0xF0000000-0xF001FFFF n/a IMMR: dual port RAM, registers - - DBAT1, IBAT1, cache inhibited: - diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt deleted file mode 100644 index 7b4e8a70882..00000000000 --- a/Documentation/powerpc/booting-without-of.txt +++ /dev/null @@ -1,3024 +0,0 @@ - Booting the Linux/ppc kernel without Open Firmware - -------------------------------------------------- - -(c) 2005 Benjamin Herrenschmidt <benh at kernel.crashing.org>, - IBM Corp. -(c) 2005 Becky Bruce <becky.bruce at freescale.com>, - Freescale Semiconductor, FSL SOC and 32-bit additions -(c) 2006 MontaVista Software, Inc. - Flash chip node definition - -Table of Contents -================= - - I - Introduction - 1) Entry point for arch/powerpc - 2) Board support - - II - The DT block format - 1) Header - 2) Device tree generalities - 3) Device tree "structure" block - 4) Device tree "strings" block - - III - Required content of the device tree - 1) Note about cells and address representation - 2) Note about "compatible" properties - 3) Note about "name" properties - 4) Note about node and property names and character set - 5) Required nodes and properties - a) The root node - b) The /cpus node - c) The /cpus/* nodes - d) the /memory node(s) - e) The /chosen node - f) the /soc<SOCname> node - - IV - "dtc", the device tree compiler - - V - Recommendations for a bootloader - - VI - System-on-a-chip devices and nodes - 1) Defining child nodes of an SOC - 2) Representing devices without a current OF specification - a) MDIO IO device - b) Gianfar-compatible ethernet nodes - c) PHY nodes - d) Interrupt controllers - e) I2C - f) Freescale SOC USB controllers - g) Freescale SOC SEC Security Engines - h) Board Control and Status (BCSR) - i) Freescale QUICC Engine module (QE) - j) CFI or JEDEC memory-mapped NOR flash - k) Global Utilities Block - l) Freescale Communications Processor Module - m) Chipselect/Local Bus - n) 4xx/Axon EMAC ethernet nodes - o) Xilinx IP cores - p) Freescale Synchronous Serial Interface - q) USB EHCI controllers - - VII - Specifying interrupt information for devices - 1) interrupts property - 2) interrupt-parent property - 3) OpenPIC Interrupt Controllers - 4) ISA Interrupt Controllers - - Appendix A - Sample SOC node for MPC8540 - - -Revision Information -==================== - - May 18, 2005: Rev 0.1 - Initial draft, no chapter III yet. - - May 19, 2005: Rev 0.2 - Add chapter III and bits & pieces here or - clarifies the fact that a lot of things are - optional, the kernel only requires a very - small device tree, though it is encouraged - to provide an as complete one as possible. - - May 24, 2005: Rev 0.3 - Precise that DT block has to be in RAM - - Misc fixes - - Define version 3 and new format version 16 - for the DT block (version 16 needs kernel - patches, will be fwd separately). - String block now has a size, and full path - is replaced by unit name for more - compactness. - linux,phandle is made optional, only nodes - that are referenced by other nodes need it. - "name" property is now automatically - deduced from the unit name - - June 1, 2005: Rev 0.4 - Correct confusion between OF_DT_END and - OF_DT_END_NODE in structure definition. - - Change version 16 format to always align - property data to 4 bytes. Since tokens are - already aligned, that means no specific - required alignment between property size - and property data. The old style variable - alignment would make it impossible to do - "simple" insertion of properties using - memmove (thanks Milton for - noticing). Updated kernel patch as well - - Correct a few more alignment constraints - - Add a chapter about the device-tree - compiler and the textural representation of - the tree that can be "compiled" by dtc. - - November 21, 2005: Rev 0.5 - - Additions/generalizations for 32-bit - - Changed to reflect the new arch/powerpc - structure - - Added chapter VI - - - ToDo: - - Add some definitions of interrupt tree (simple/complex) - - Add some definitions for PCI host bridges - - Add some common address format examples - - Add definitions for standard properties and "compatible" - names for cells that are not already defined by the existing - OF spec. - - Compare FSL SOC use of PCI to standard and make sure no new - node definition required. - - Add more information about node definitions for SOC devices - that currently have no standard, like the FSL CPM. - - -I - Introduction -================ - -During the recent development of the Linux/ppc64 kernel, and more -specifically, the addition of new platform types outside of the old -IBM pSeries/iSeries pair, it was decided to enforce some strict rules -regarding the kernel entry and bootloader <-> kernel interfaces, in -order to avoid the degeneration that had become the ppc32 kernel entry -point and the way a new platform should be added to the kernel. The -legacy iSeries platform breaks those rules as it predates this scheme, -but no new board support will be accepted in the main tree that -doesn't follows them properly. In addition, since the advent of the -arch/powerpc merged architecture for ppc32 and ppc64, new 32-bit -platforms and 32-bit platforms which move into arch/powerpc will be -required to use these rules as well. - -The main requirement that will be defined in more detail below is -the presence of a device-tree whose format is defined after Open -Firmware specification. However, in order to make life easier -to embedded board vendors, the kernel doesn't require the device-tree -to represent every device in the system and only requires some nodes -and properties to be present. This will be described in detail in -section III, but, for example, the kernel does not require you to -create a node for every PCI device in the system. It is a requirement -to have a node for PCI host bridges in order to provide interrupt -routing informations and memory/IO ranges, among others. It is also -recommended to define nodes for on chip devices and other busses that -don't specifically fit in an existing OF specification. This creates a -great flexibility in the way the kernel can then probe those and match -drivers to device, without having to hard code all sorts of tables. It -also makes it more flexible for board vendors to do minor hardware -upgrades without significantly impacting the kernel code or cluttering -it with special cases. - - -1) Entry point for arch/powerpc -------------------------------- - - There is one and one single entry point to the kernel, at the start - of the kernel image. That entry point supports two calling - conventions: - - a) Boot from Open Firmware. If your firmware is compatible - with Open Firmware (IEEE 1275) or provides an OF compatible - client interface API (support for "interpret" callback of - forth words isn't required), you can enter the kernel with: - - r5 : OF callback pointer as defined by IEEE 1275 - bindings to powerpc. Only the 32-bit client interface - is currently supported - - r3, r4 : address & length of an initrd if any or 0 - - The MMU is either on or off; the kernel will run the - trampoline located in arch/powerpc/kernel/prom_init.c to - extract the device-tree and other information from open - firmware and build a flattened device-tree as described - in b). prom_init() will then re-enter the kernel using - the second method. This trampoline code runs in the - context of the firmware, which is supposed to handle all - exceptions during that time. - - b) Direct entry with a flattened device-tree block. This entry - point is called by a) after the OF trampoline and can also be - called directly by a bootloader that does not support the Open - Firmware client interface. It is also used by "kexec" to - implement "hot" booting of a new kernel from a previous - running one. This method is what I will describe in more - details in this document, as method a) is simply standard Open - Firmware, and thus should be implemented according to the - various standard documents defining it and its binding to the - PowerPC platform. The entry point definition then becomes: - - r3 : physical pointer to the device-tree block - (defined in chapter II) in RAM - - r4 : physical pointer to the kernel itself. This is - used by the assembly code to properly disable the MMU - in case you are entering the kernel with MMU enabled - and a non-1:1 mapping. - - r5 : NULL (as to differentiate with method a) - - Note about SMP entry: Either your firmware puts your other - CPUs in some sleep loop or spin loop in ROM where you can get - them out via a soft reset or some other means, in which case - you don't need to care, or you'll have to enter the kernel - with all CPUs. The way to do that with method b) will be - described in a later revision of this document. - - -2) Board support ----------------- - -64-bit kernels: - - Board supports (platforms) are not exclusive config options. An - arbitrary set of board supports can be built in a single kernel - image. The kernel will "know" what set of functions to use for a - given platform based on the content of the device-tree. Thus, you - should: - - a) add your platform support as a _boolean_ option in - arch/powerpc/Kconfig, following the example of PPC_PSERIES, - PPC_PMAC and PPC_MAPLE. The later is probably a good - example of a board support to start from. - - b) create your main platform file as - "arch/powerpc/platforms/myplatform/myboard_setup.c" and add it - to the Makefile under the condition of your CONFIG_ - option. This file will define a structure of type "ppc_md" - containing the various callbacks that the generic code will - use to get to your platform specific code - - c) Add a reference to your "ppc_md" structure in the - "machines" table in arch/powerpc/kernel/setup_64.c if you are - a 64-bit platform. - - d) request and get assigned a platform number (see PLATFORM_* - constants in include/asm-powerpc/processor.h - -32-bit embedded kernels: - - Currently, board support is essentially an exclusive config option. - The kernel is configured for a single platform. Part of the reason - for this is to keep kernels on embedded systems small and efficient; - part of this is due to the fact the code is already that way. In the - future, a kernel may support multiple platforms, but only if the - platforms feature the same core architecture. A single kernel build - cannot support both configurations with Book E and configurations - with classic Powerpc architectures. - - 32-bit embedded platforms that are moved into arch/powerpc using a - flattened device tree should adopt the merged tree practice of - setting ppc_md up dynamically, even though the kernel is currently - built with support for only a single platform at a time. This allows - unification of the setup code, and will make it easier to go to a - multiple-platform-support model in the future. - -NOTE: I believe the above will be true once Ben's done with the merge -of the boot sequences.... someone speak up if this is wrong! - - To add a 32-bit embedded platform support, follow the instructions - for 64-bit platforms above, with the exception that the Kconfig - option should be set up such that the kernel builds exclusively for - the platform selected. The processor type for the platform should - enable another config option to select the specific board - supported. - -NOTE: If Ben doesn't merge the setup files, may need to change this to -point to setup_32.c - - - I will describe later the boot process and various callbacks that - your platform should implement. - - -II - The DT block format -======================== - - -This chapter defines the actual format of the flattened device-tree -passed to the kernel. The actual content of it and kernel requirements -are described later. You can find example of code manipulating that -format in various places, including arch/powerpc/kernel/prom_init.c -which will generate a flattened device-tree from the Open Firmware -representation, or the fs2dt utility which is part of the kexec tools -which will generate one from a filesystem representation. It is -expected that a bootloader like uboot provides a bit more support, -that will be discussed later as well. - -Note: The block has to be in main memory. It has to be accessible in -both real mode and virtual mode with no mapping other than main -memory. If you are writing a simple flash bootloader, it should copy -the block to RAM before passing it to the kernel. - - -1) Header ---------- - - The kernel is entered with r3 pointing to an area of memory that is - roughly described in include/asm-powerpc/prom.h by the structure - boot_param_header: - -struct boot_param_header { - u32 magic; /* magic word OF_DT_HEADER */ - u32 totalsize; /* total size of DT block */ - u32 off_dt_struct; /* offset to structure */ - u32 off_dt_strings; /* offset to strings */ - u32 off_mem_rsvmap; /* offset to memory reserve map - */ - u32 version; /* format version */ - u32 last_comp_version; /* last compatible version */ - - /* version 2 fields below */ - u32 boot_cpuid_phys; /* Which physical CPU id we're - booting on */ - /* version 3 fields below */ - u32 size_dt_strings; /* size of the strings block */ - - /* version 17 fields below */ - u32 size_dt_struct; /* size of the DT structure block */ -}; - - Along with the constants: - -/* Definitions used by the flattened device tree */ -#define OF_DT_HEADER 0xd00dfeed /* 4: version, - 4: total size */ -#define OF_DT_BEGIN_NODE 0x1 /* Start node: full name - */ -#define OF_DT_END_NODE 0x2 /* End node */ -#define OF_DT_PROP 0x3 /* Property: name off, - size, content */ -#define OF_DT_END 0x9 - - All values in this header are in big endian format, the various - fields in this header are defined more precisely below. All - "offset" values are in bytes from the start of the header; that is - from the value of r3. - - - magic - - This is a magic value that "marks" the beginning of the - device-tree block header. It contains the value 0xd00dfeed and is - defined by the constant OF_DT_HEADER - - - totalsize - - This is the total size of the DT block including the header. The - "DT" block should enclose all data structures defined in this - chapter (who are pointed to by offsets in this header). That is, - the device-tree structure, strings, and the memory reserve map. - - - off_dt_struct - - This is an offset from the beginning of the header to the start - of the "structure" part the device tree. (see 2) device tree) - - - off_dt_strings - - This is an offset from the beginning of the header to the start - of the "strings" part of the device-tree - - - off_mem_rsvmap - - This is an offset from the beginning of the header to the start - of the reserved memory map. This map is a list of pairs of 64- - bit integers. Each pair is a physical address and a size. The - list is terminated by an entry of size 0. This map provides the - kernel with a list of physical memory areas that are "reserved" - and thus not to be used for memory allocations, especially during - early initialization. The kernel needs to allocate memory during - boot for things like un-flattening the device-tree, allocating an - MMU hash table, etc... Those allocations must be done in such a - way to avoid overriding critical things like, on Open Firmware - capable machines, the RTAS instance, or on some pSeries, the TCE - tables used for the iommu. Typically, the reserve map should - contain _at least_ this DT block itself (header,total_size). If - you are passing an initrd to the kernel, you should reserve it as - well. You do not need to reserve the kernel image itself. The map - should be 64-bit aligned. - - - version - - This is the version of this structure. Version 1 stops - here. Version 2 adds an additional field boot_cpuid_phys. - Version 3 adds the size of the strings block, allowing the kernel - to reallocate it easily at boot and free up the unused flattened - structure after expansion. Version 16 introduces a new more - "compact" format for the tree itself that is however not backward - compatible. Version 17 adds an additional field, size_dt_struct, - allowing it to be reallocated or moved more easily (this is - particularly useful for bootloaders which need to make - adjustments to a device tree based on probed information). You - should always generate a structure of the highest version defined - at the time of your implementation. Currently that is version 17, - unless you explicitly aim at being backward compatible. - - - last_comp_version - - Last compatible version. This indicates down to what version of - the DT block you are backward compatible. For example, version 2 - is backward compatible with version 1 (that is, a kernel build - for version 1 will be able to boot with a version 2 format). You - should put a 1 in this field if you generate a device tree of - version 1 to 3, or 16 if you generate a tree of version 16 or 17 - using the new unit name format. - - - boot_cpuid_phys - - This field only exist on version 2 headers. It indicate which - physical CPU ID is calling the kernel entry point. This is used, - among others, by kexec. If you are on an SMP system, this value - should match the content of the "reg" property of the CPU node in - the device-tree corresponding to the CPU calling the kernel entry - point (see further chapters for more informations on the required - device-tree contents) - - - size_dt_strings - - This field only exists on version 3 and later headers. It - gives the size of the "strings" section of the device tree (which - starts at the offset given by off_dt_strings). - - - size_dt_struct - - This field only exists on version 17 and later headers. It gives - the size of the "structure" section of the device tree (which - starts at the offset given by off_dt_struct). - - So the typical layout of a DT block (though the various parts don't - need to be in that order) looks like this (addresses go from top to - bottom): - - - ------------------------------ - r3 -> | struct boot_param_header | - ------------------------------ - | (alignment gap) (*) | - ------------------------------ - | memory reserve map | - ------------------------------ - | (alignment gap) | - ------------------------------ - | | - | device-tree structure | - | | - ------------------------------ - | (alignment gap) | - ------------------------------ - | | - | device-tree strings | - | | - -----> ------------------------------ - | - | - --- (r3 + totalsize) - - (*) The alignment gaps are not necessarily present; their presence - and size are dependent on the various alignment requirements of - the individual data blocks. - - -2) Device tree generalities ---------------------------- - -This device-tree itself is separated in two different blocks, a -structure block and a strings block. Both need to be aligned to a 4 -byte boundary. - -First, let's quickly describe the device-tree concept before detailing -the storage format. This chapter does _not_ describe the detail of the -required types of nodes & properties for the kernel, this is done -later in chapter III. - -The device-tree layout is strongly inherited from the definition of -the Open Firmware IEEE 1275 device-tree. It's basically a tree of -nodes, each node having two or more named properties. A property can -have a value or not. - -It is a tree, so each node has one and only one parent except for the -root node who has no parent. - -A node has 2 names. The actual node name is generally contained in a -property of type "name" in the node property list whose value is a -zero terminated string and is mandatory for version 1 to 3 of the -format definition (as it is in Open Firmware). Version 16 makes it -optional as it can generate it from the unit name defined below. - -There is also a "unit name" that is used to differentiate nodes with -the same name at the same level, it is usually made of the node -names, the "@" sign, and a "unit address", which definition is -specific to the bus type the node sits on. - -The unit name doesn't exist as a property per-se but is included in -the device-tree structure. It is typically used to represent "path" in -the device-tree. More details about the actual format of these will be -below. - -The kernel powerpc generic code does not make any formal use of the -unit address (though some board support code may do) so the only real -requirement here for the unit address is to ensure uniqueness of -the node unit name at a given level of the tree. Nodes with no notion -of address and no possible sibling of the same name (like /memory or -/cpus) may omit the unit address in the context of this specification, -or use the "@0" default unit address. The unit name is used to define -a node "full path", which is the concatenation of all parent node -unit names separated with "/". - -The root node doesn't have a defined name, and isn't required to have -a name property either if you are using version 3 or earlier of the -format. It also has no unit address (no @ symbol followed by a unit -address). The root node unit name is thus an empty string. The full -path to the root node is "/". - -Every node which actually represents an actual device (that is, a node -which isn't only a virtual "container" for more nodes, like "/cpus" -is) is also required to have a "device_type" property indicating the -type of node . - -Finally, every node that can be referenced from a property in another -node is required to have a "linux,phandle" property. Real open -firmware implementations provide a unique "phandle" value for every -node that the "prom_init()" trampoline code turns into -"linux,phandle" properties. However, this is made optional if the -flattened device tree is used directly. An example of a node -referencing another node via "phandle" is when laying out the -interrupt tree which will be described in a further version of this -document. - -This "linux, phandle" property is a 32-bit value that uniquely -identifies a node. You are free to use whatever values or system of -values, internal pointers, or whatever to generate these, the only -requirement is that every node for which you provide that property has -a unique value for it. - -Here is an example of a simple device-tree. In this example, an "o" -designates a node followed by the node unit name. Properties are -presented with their name followed by their content. "content" -represents an ASCII string (zero terminated) value, while <content> -represents a 32-bit hexadecimal value. The various nodes in this -example will be discussed in a later chapter. At this point, it is -only meant to give you a idea of what a device-tree looks like. I have -purposefully kept the "name" and "linux,phandle" properties which -aren't necessary in order to give you a better idea of what the tree -looks like in practice. - - / o device-tree - |- name = "device-tree" - |- model = "MyBoardName" - |- compatible = "MyBoardFamilyName" - |- #address-cells = <2> - |- #size-cells = <2> - |- linux,phandle = <0> - | - o cpus - | | - name = "cpus" - | | - linux,phandle = <1> - | | - #address-cells = <1> - | | - #size-cells = <0> - | | - | o PowerPC,970@0 - | |- name = "PowerPC,970" - | |- device_type = "cpu" - | |- reg = <0> - | |- clock-frequency = <5f5e1000> - | |- 64-bit - | |- linux,phandle = <2> - | - o memory@0 - | |- name = "memory" - | |- device_type = "memory" - | |- reg = <00000000 00000000 00000000 20000000> - | |- linux,phandle = <3> - | - o chosen - |- name = "chosen" - |- bootargs = "root=/dev/sda2" - |- linux,phandle = <4> - -This tree is almost a minimal tree. It pretty much contains the -minimal set of required nodes and properties to boot a linux kernel; -that is, some basic model informations at the root, the CPUs, and the -physical memory layout. It also includes misc information passed -through /chosen, like in this example, the platform type (mandatory) -and the kernel command line arguments (optional). - -The /cpus/PowerPC,970@0/64-bit property is an example of a -property without a value. All other properties have a value. The -significance of the #address-cells and #size-cells properties will be -explained in chapter IV which defines precisely the required nodes and -properties and their content. - - -3) Device tree "structure" block - -The structure of the device tree is a linearized tree structure. The -"OF_DT_BEGIN_NODE" token starts a new node, and the "OF_DT_END_NODE" -ends that node definition. Child nodes are simply defined before -"OF_DT_END_NODE" (that is nodes within the node). A 'token' is a 32 -bit value. The tree has to be "finished" with a OF_DT_END token - -Here's the basic structure of a single node: - - * token OF_DT_BEGIN_NODE (that is 0x00000001) - * for version 1 to 3, this is the node full path as a zero - terminated string, starting with "/". For version 16 and later, - this is the node unit name only (or an empty string for the - root node) - * [align gap to next 4 bytes boundary] - * for each property: - * token OF_DT_PROP (that is 0x00000003) - * 32-bit value of property value size in bytes (or 0 if no - value) - * 32-bit value of offset in string block of property name - * property value data if any - * [align gap to next 4 bytes boundary] - * [child nodes if any] - * token OF_DT_END_NODE (that is 0x00000002) - -So the node content can be summarized as a start token, a full path, -a list of properties, a list of child nodes, and an end token. Every -child node is a full node structure itself as defined above. - -NOTE: The above definition requires that all property definitions for -a particular node MUST precede any subnode definitions for that node. -Although the structure would not be ambiguous if properties and -subnodes were intermingled, the kernel parser requires that the -properties come first (up until at least 2.6.22). Any tools -manipulating a flattened tree must take care to preserve this -constraint. - -4) Device tree "strings" block - -In order to save space, property names, which are generally redundant, -are stored separately in the "strings" block. This block is simply the -whole bunch of zero terminated strings for all property names -concatenated together. The device-tree property definitions in the -structure block will contain offset values from the beginning of the -strings block. - - -III - Required content of the device tree -========================================= - -WARNING: All "linux,*" properties defined in this document apply only -to a flattened device-tree. If your platform uses a real -implementation of Open Firmware or an implementation compatible with -the Open Firmware client interface, those properties will be created -by the trampoline code in the kernel's prom_init() file. For example, -that's where you'll have to add code to detect your board model and -set the platform number. However, when using the flattened device-tree -entry point, there is no prom_init() pass, and thus you have to -provide those properties yourself. - - -1) Note about cells and address representation ----------------------------------------------- - -The general rule is documented in the various Open Firmware -documentations. If you choose to describe a bus with the device-tree -and there exist an OF bus binding, then you should follow the -specification. However, the kernel does not require every single -device or bus to be described by the device tree. - -In general, the format of an address for a device is defined by the -parent bus type, based on the #address-cells and #size-cells -properties. Note that the parent's parent definitions of #address-cells -and #size-cells are not inhereted so every node with children must specify -them. The kernel requires the root node to have those properties defining -addresses format for devices directly mapped on the processor bus. - -Those 2 properties define 'cells' for representing an address and a -size. A "cell" is a 32-bit number. For example, if both contain 2 -like the example tree given above, then an address and a size are both -composed of 2 cells, and each is a 64-bit number (cells are -concatenated and expected to be in big endian format). Another example -is the way Apple firmware defines them, with 2 cells for an address -and one cell for a size. Most 32-bit implementations should define -#address-cells and #size-cells to 1, which represents a 32-bit value. -Some 32-bit processors allow for physical addresses greater than 32 -bits; these processors should define #address-cells as 2. - -"reg" properties are always a tuple of the type "address size" where -the number of cells of address and size is specified by the bus -#address-cells and #size-cells. When a bus supports various address -spaces and other flags relative to a given address allocation (like -prefetchable, etc...) those flags are usually added to the top level -bits of the physical address. For example, a PCI physical address is -made of 3 cells, the bottom two containing the actual address itself -while the top cell contains address space indication, flags, and pci -bus & device numbers. - -For busses that support dynamic allocation, it's the accepted practice -to then not provide the address in "reg" (keep it 0) though while -providing a flag indicating the address is dynamically allocated, and -then, to provide a separate "assigned-addresses" property that -contains the fully allocated addresses. See the PCI OF bindings for -details. - -In general, a simple bus with no address space bits and no dynamic -allocation is preferred if it reflects your hardware, as the existing -kernel address parsing functions will work out of the box. If you -define a bus type with a more complex address format, including things -like address space bits, you'll have to add a bus translator to the -prom_parse.c file of the recent kernels for your bus type. - -The "reg" property only defines addresses and sizes (if #size-cells is -non-0) within a given bus. In order to translate addresses upward -(that is into parent bus addresses, and possibly into CPU physical -addresses), all busses must contain a "ranges" property. If the -"ranges" property is missing at a given level, it's assumed that -translation isn't possible, i.e., the registers are not visible on the -parent bus. The format of the "ranges" property for a bus is a list -of: - - bus address, parent bus address, size - -"bus address" is in the format of the bus this bus node is defining, -that is, for a PCI bridge, it would be a PCI address. Thus, (bus -address, size) defines a range of addresses for child devices. "parent -bus address" is in the format of the parent bus of this bus. For -example, for a PCI host controller, that would be a CPU address. For a -PCI<->ISA bridge, that would be a PCI address. It defines the base -address in the parent bus where the beginning of that range is mapped. - -For a new 64-bit powerpc board, I recommend either the 2/2 format or -Apple's 2/1 format which is slightly more compact since sizes usually -fit in a single 32-bit word. New 32-bit powerpc boards should use a -1/1 format, unless the processor supports physical addresses greater -than 32-bits, in which case a 2/1 format is recommended. - -Alternatively, the "ranges" property may be empty, indicating that the -registers are visible on the parent bus using an identity mapping -translation. In other words, the parent bus address space is the same -as the child bus address space. - -2) Note about "compatible" properties -------------------------------------- - -These properties are optional, but recommended in devices and the root -node. The format of a "compatible" property is a list of concatenated -zero terminated strings. They allow a device to express its -compatibility with a family of similar devices, in some cases, -allowing a single driver to match against several devices regardless -of their actual names. - -3) Note about "name" properties -------------------------------- - -While earlier users of Open Firmware like OldWorld macintoshes tended -to use the actual device name for the "name" property, it's nowadays -considered a good practice to use a name that is closer to the device -class (often equal to device_type). For example, nowadays, ethernet -controllers are named "ethernet", an additional "model" property -defining precisely the chip type/model, and "compatible" property -defining the family in case a single driver can driver more than one -of these chips. However, the kernel doesn't generally put any -restriction on the "name" property; it is simply considered good -practice to follow the standard and its evolutions as closely as -possible. - -Note also that the new format version 16 makes the "name" property -optional. If it's absent for a node, then the node's unit name is then -used to reconstruct the name. That is, the part of the unit name -before the "@" sign is used (or the entire unit name if no "@" sign -is present). - -4) Note about node and property names and character set -------------------------------------------------------- - -While open firmware provides more flexible usage of 8859-1, this -specification enforces more strict rules. Nodes and properties should -be comprised only of ASCII characters 'a' to 'z', '0' to -'9', ',', '.', '_', '+', '#', '?', and '-'. Node names additionally -allow uppercase characters 'A' to 'Z' (property names should be -lowercase. The fact that vendors like Apple don't respect this rule is -irrelevant here). Additionally, node and property names should always -begin with a character in the range 'a' to 'z' (or 'A' to 'Z' for node -names). - -The maximum number of characters for both nodes and property names -is 31. In the case of node names, this is only the leftmost part of -a unit name (the pure "name" property), it doesn't include the unit -address which can extend beyond that limit. - - -5) Required nodes and properties --------------------------------- - These are all that are currently required. However, it is strongly - recommended that you expose PCI host bridges as documented in the - PCI binding to open firmware, and your interrupt tree as documented - in OF interrupt tree specification. - - a) The root node - - The root node requires some properties to be present: - - - model : this is your board name/model - - #address-cells : address representation for "root" devices - - #size-cells: the size representation for "root" devices - - device_type : This property shouldn't be necessary. However, if - you decide to create a device_type for your root node, make sure it - is _not_ "chrp" unless your platform is a pSeries or PAPR compliant - one for 64-bit, or a CHRP-type machine for 32-bit as this will - matched by the kernel this way. - - Additionally, some recommended properties are: - - - compatible : the board "family" generally finds its way here, - for example, if you have 2 board models with a similar layout, - that typically get driven by the same platform code in the - kernel, you would use a different "model" property but put a - value in "compatible". The kernel doesn't directly use that - value but it is generally useful. - - The root node is also generally where you add additional properties - specific to your board like the serial number if any, that sort of - thing. It is recommended that if you add any "custom" property whose - name may clash with standard defined ones, you prefix them with your - vendor name and a comma. - - b) The /cpus node - - This node is the parent of all individual CPU nodes. It doesn't - have any specific requirements, though it's generally good practice - to have at least: - - #address-cells = <00000001> - #size-cells = <00000000> - - This defines that the "address" for a CPU is a single cell, and has - no meaningful size. This is not necessary but the kernel will assume - that format when reading the "reg" properties of a CPU node, see - below - - c) The /cpus/* nodes - - So under /cpus, you are supposed to create a node for every CPU on - the machine. There is no specific restriction on the name of the - CPU, though It's common practice to call it PowerPC,<name>. For - example, Apple uses PowerPC,G5 while IBM uses PowerPC,970FX. - - Required properties: - - - device_type : has to be "cpu" - - reg : This is the physical CPU number, it's a single 32-bit cell - and is also used as-is as the unit number for constructing the - unit name in the full path. For example, with 2 CPUs, you would - have the full path: - /cpus/PowerPC,970FX@0 - /cpus/PowerPC,970FX@1 - (unit addresses do not require leading zeroes) - - d-cache-block-size : one cell, L1 data cache block size in bytes (*) - - i-cache-block-size : one cell, L1 instruction cache block size in - bytes - - d-cache-size : one cell, size of L1 data cache in bytes - - i-cache-size : one cell, size of L1 instruction cache in bytes - -(*) The cache "block" size is the size on which the cache management -instructions operate. Historically, this document used the cache -"line" size here which is incorrect. The kernel will prefer the cache -block size and will fallback to cache line size for backward -compatibility. - - Recommended properties: - - - timebase-frequency : a cell indicating the frequency of the - timebase in Hz. This is not directly used by the generic code, - but you are welcome to copy/paste the pSeries code for setting - the kernel timebase/decrementer calibration based on this - value. - - clock-frequency : a cell indicating the CPU core clock frequency - in Hz. A new property will be defined for 64-bit values, but if - your frequency is < 4Ghz, one cell is enough. Here as well as - for the above, the common code doesn't use that property, but - you are welcome to re-use the pSeries or Maple one. A future - kernel version might provide a common function for this. - - d-cache-line-size : one cell, L1 data cache line size in bytes - if different from the block size - - i-cache-line-size : one cell, L1 instruction cache line size in - bytes if different from the block size - - You are welcome to add any property you find relevant to your board, - like some information about the mechanism used to soft-reset the - CPUs. For example, Apple puts the GPIO number for CPU soft reset - lines in there as a "soft-reset" property since they start secondary - CPUs by soft-resetting them. - - - d) the /memory node(s) - - To define the physical memory layout of your board, you should - create one or more memory node(s). You can either create a single - node with all memory ranges in its reg property, or you can create - several nodes, as you wish. The unit address (@ part) used for the - full path is the address of the first range of memory defined by a - given node. If you use a single memory node, this will typically be - @0. - - Required properties: - - - device_type : has to be "memory" - - reg : This property contains all the physical memory ranges of - your board. It's a list of addresses/sizes concatenated - together, with the number of cells of each defined by the - #address-cells and #size-cells of the root node. For example, - with both of these properties being 2 like in the example given - earlier, a 970 based machine with 6Gb of RAM could typically - have a "reg" property here that looks like: - - 00000000 00000000 00000000 80000000 - 00000001 00000000 00000001 00000000 - - That is a range starting at 0 of 0x80000000 bytes and a range - starting at 0x100000000 and of 0x100000000 bytes. You can see - that there is no memory covering the IO hole between 2Gb and - 4Gb. Some vendors prefer splitting those ranges into smaller - segments, but the kernel doesn't care. - - e) The /chosen node - - This node is a bit "special". Normally, that's where open firmware - puts some variable environment information, like the arguments, or - the default input/output devices. - - This specification makes a few of these mandatory, but also defines - some linux-specific properties that would be normally constructed by - the prom_init() trampoline when booting with an OF client interface, - but that you have to provide yourself when using the flattened format. - - Recommended properties: - - - bootargs : This zero-terminated string is passed as the kernel - command line - - linux,stdout-path : This is the full path to your standard - console device if any. Typically, if you have serial devices on - your board, you may want to put the full path to the one set as - the default console in the firmware here, for the kernel to pick - it up as its own default console. If you look at the function - set_preferred_console() in arch/ppc64/kernel/setup.c, you'll see - that the kernel tries to find out the default console and has - knowledge of various types like 8250 serial ports. You may want - to extend this function to add your own. - - Note that u-boot creates and fills in the chosen node for platforms - that use it. - - (Note: a practice that is now obsolete was to include a property - under /chosen called interrupt-controller which had a phandle value - that pointed to the main interrupt controller) - - f) the /soc<SOCname> node - - This node is used to represent a system-on-a-chip (SOC) and must be - present if the processor is a SOC. The top-level soc node contains - information that is global to all devices on the SOC. The node name - should contain a unit address for the SOC, which is the base address - of the memory-mapped register set for the SOC. The name of an soc - node should start with "soc", and the remainder of the name should - represent the part number for the soc. For example, the MPC8540's - soc node would be called "soc8540". - - Required properties: - - - device_type : Should be "soc" - - ranges : Should be defined as specified in 1) to describe the - translation of SOC addresses for memory mapped SOC registers. - - bus-frequency: Contains the bus frequency for the SOC node. - Typically, the value of this field is filled in by the boot - loader. - - - Recommended properties: - - - reg : This property defines the address and size of the - memory-mapped registers that are used for the SOC node itself. - It does not include the child device registers - these will be - defined inside each child node. The address specified in the - "reg" property should match the unit address of the SOC node. - - #address-cells : Address representation for "soc" devices. The - format of this field may vary depending on whether or not the - device registers are memory mapped. For memory mapped - registers, this field represents the number of cells needed to - represent the address of the registers. For SOCs that do not - use MMIO, a special address format should be defined that - contains enough cells to represent the required information. - See 1) above for more details on defining #address-cells. - - #size-cells : Size representation for "soc" devices - - #interrupt-cells : Defines the width of cells used to represent - interrupts. Typically this value is <2>, which includes a - 32-bit number that represents the interrupt number, and a - 32-bit number that represents the interrupt sense and level. - This field is only needed if the SOC contains an interrupt - controller. - - The SOC node may contain child nodes for each SOC device that the - platform uses. Nodes should not be created for devices which exist - on the SOC but are not used by a particular platform. See chapter VI - for more information on how to specify devices that are part of a SOC. - - Example SOC node for the MPC8540: - - soc8540@e0000000 { - #address-cells = <1>; - #size-cells = <1>; - #interrupt-cells = <2>; - device_type = "soc"; - ranges = <00000000 e0000000 00100000> - reg = <e0000000 00003000>; - bus-frequency = <0>; - } - - - -IV - "dtc", the device tree compiler -==================================== - - -dtc source code can be found at -<http://ozlabs.org/~dgibson/dtc/dtc.tar.gz> - -WARNING: This version is still in early development stage; the -resulting device-tree "blobs" have not yet been validated with the -kernel. The current generated bloc lacks a useful reserve map (it will -be fixed to generate an empty one, it's up to the bootloader to fill -it up) among others. The error handling needs work, bugs are lurking, -etc... - -dtc basically takes a device-tree in a given format and outputs a -device-tree in another format. The currently supported formats are: - - Input formats: - ------------- - - - "dtb": "blob" format, that is a flattened device-tree block - with - header all in a binary blob. - - "dts": "source" format. This is a text file containing a - "source" for a device-tree. The format is defined later in this - chapter. - - "fs" format. This is a representation equivalent to the - output of /proc/device-tree, that is nodes are directories and - properties are files - - Output formats: - --------------- - - - "dtb": "blob" format - - "dts": "source" format - - "asm": assembly language file. This is a file that can be - sourced by gas to generate a device-tree "blob". That file can - then simply be added to your Makefile. Additionally, the - assembly file exports some symbols that can be used. - - -The syntax of the dtc tool is - - dtc [-I <input-format>] [-O <output-format>] - [-o output-filename] [-V output_version] input_filename - - -The "output_version" defines what version of the "blob" format will be -generated. Supported versions are 1,2,3 and 16. The default is -currently version 3 but that may change in the future to version 16. - -Additionally, dtc performs various sanity checks on the tree, like the -uniqueness of linux, phandle properties, validity of strings, etc... - -The format of the .dts "source" file is "C" like, supports C and C++ -style comments. - -/ { -} - -The above is the "device-tree" definition. It's the only statement -supported currently at the toplevel. - -/ { - property1 = "string_value"; /* define a property containing a 0 - * terminated string - */ - - property2 = <1234abcd>; /* define a property containing a - * numerical 32-bit value (hexadecimal) - */ - - property3 = <12345678 12345678 deadbeef>; - /* define a property containing 3 - * numerical 32-bit values (cells) in - * hexadecimal - */ - property4 = [0a 0b 0c 0d de ea ad be ef]; - /* define a property whose content is - * an arbitrary array of bytes - */ - - childnode@addresss { /* define a child node named "childnode" - * whose unit name is "childnode at - * address" - */ - - childprop = "hello\n"; /* define a property "childprop" of - * childnode (in this case, a string) - */ - }; -}; - -Nodes can contain other nodes etc... thus defining the hierarchical -structure of the tree. - -Strings support common escape sequences from C: "\n", "\t", "\r", -"\(octal value)", "\x(hex value)". - -It is also suggested that you pipe your source file through cpp (gcc -preprocessor) so you can use #include's, #define for constants, etc... - -Finally, various options are planned but not yet implemented, like -automatic generation of phandles, labels (exported to the asm file so -you can point to a property content and change it easily from whatever -you link the device-tree with), label or path instead of numeric value -in some cells to "point" to a node (replaced by a phandle at compile -time), export of reserve map address to the asm file, ability to -specify reserve map content at compile time, etc... - -We may provide a .h include file with common definitions of that -proves useful for some properties (like building PCI properties or -interrupt maps) though it may be better to add a notion of struct -definitions to the compiler... - - -V - Recommendations for a bootloader -==================================== - - -Here are some various ideas/recommendations that have been proposed -while all this has been defined and implemented. - - - The bootloader may want to be able to use the device-tree itself - and may want to manipulate it (to add/edit some properties, - like physical memory size or kernel arguments). At this point, 2 - choices can be made. Either the bootloader works directly on the - flattened format, or the bootloader has its own internal tree - representation with pointers (similar to the kernel one) and - re-flattens the tree when booting the kernel. The former is a bit - more difficult to edit/modify, the later requires probably a bit - more code to handle the tree structure. Note that the structure - format has been designed so it's relatively easy to "insert" - properties or nodes or delete them by just memmoving things - around. It contains no internal offsets or pointers for this - purpose. - - - An example of code for iterating nodes & retrieving properties - directly from the flattened tree format can be found in the kernel - file arch/ppc64/kernel/prom.c, look at scan_flat_dt() function, - its usage in early_init_devtree(), and the corresponding various - early_init_dt_scan_*() callbacks. That code can be re-used in a - GPL bootloader, and as the author of that code, I would be happy - to discuss possible free licensing to any vendor who wishes to - integrate all or part of this code into a non-GPL bootloader. - - - -VI - System-on-a-chip devices and nodes -======================================= - -Many companies are now starting to develop system-on-a-chip -processors, where the processor core (CPU) and many peripheral devices -exist on a single piece of silicon. For these SOCs, an SOC node -should be used that defines child nodes for the devices that make -up the SOC. While platforms are not required to use this model in -order to boot the kernel, it is highly encouraged that all SOC -implementations define as complete a flat-device-tree as possible to -describe the devices on the SOC. This will allow for the -genericization of much of the kernel code. - - -1) Defining child nodes of an SOC ---------------------------------- - -Each device that is part of an SOC may have its own node entry inside -the SOC node. For each device that is included in the SOC, the unit -address property represents the address offset for this device's -memory-mapped registers in the parent's address space. The parent's -address space is defined by the "ranges" property in the top-level soc -node. The "reg" property for each node that exists directly under the -SOC node should contain the address mapping from the child address space -to the parent SOC address space and the size of the device's -memory-mapped register file. - -For many devices that may exist inside an SOC, there are predefined -specifications for the format of the device tree node. All SOC child -nodes should follow these specifications, except where noted in this -document. - -See appendix A for an example partial SOC node definition for the -MPC8540. - - -2) Representing devices without a current OF specification ----------------------------------------------------------- - -Currently, there are many devices on SOCs that do not have a standard -representation pre-defined as part of the open firmware -specifications, mainly because the boards that contain these SOCs are -not currently booted using open firmware. This section contains -descriptions for the SOC devices for which new nodes have been -defined; this list will expand as more and more SOC-containing -platforms are moved over to use the flattened-device-tree model. - - a) MDIO IO device - - The MDIO is a bus to which the PHY devices are connected. For each - device that exists on this bus, a child node should be created. See - the definition of the PHY node below for an example of how to define - a PHY. - - Required properties: - - reg : Offset and length of the register set for the device - - compatible : Should define the compatible device type for the - mdio. Currently, this is most likely to be "fsl,gianfar-mdio" - - Example: - - mdio@24520 { - reg = <24520 20>; - compatible = "fsl,gianfar-mdio"; - - ethernet-phy@0 { - ...... - }; - }; - - - b) Gianfar-compatible ethernet nodes - - Required properties: - - - device_type : Should be "network" - - model : Model of the device. Can be "TSEC", "eTSEC", or "FEC" - - compatible : Should be "gianfar" - - reg : Offset and length of the register set for the device - - mac-address : List of bytes representing the ethernet address of - this controller - - interrupts : <a b> where a is the interrupt number and b is a - field that represents an encoding of the sense and level - information for the interrupt. This should be encoded based on - the information in section 2) depending on the type of interrupt - controller you have. - - interrupt-parent : the phandle for the interrupt controller that - services interrupts for this device. - - phy-handle : The phandle for the PHY connected to this ethernet - controller. - - fixed-link : <a b c d e> where a is emulated phy id - choose any, - but unique to the all specified fixed-links, b is duplex - 0 half, - 1 full, c is link speed - d#10/d#100/d#1000, d is pause - 0 no - pause, 1 pause, e is asym_pause - 0 no asym_pause, 1 asym_pause. - - Recommended properties: - - - linux,network-index : This is the intended "index" of this - network device. This is used by the bootwrapper to interpret - MAC addresses passed by the firmware when no information other - than indices is available to associate an address with a device. - - phy-connection-type : a string naming the controller/PHY interface type, - i.e., "mii" (default), "rmii", "gmii", "rgmii", "rgmii-id", "sgmii", - "tbi", or "rtbi". This property is only really needed if the connection - is of type "rgmii-id", as all other connection types are detected by - hardware. - - - Example: - - ethernet@24000 { - #size-cells = <0>; - device_type = "network"; - model = "TSEC"; - compatible = "gianfar"; - reg = <24000 1000>; - mac-address = [ 00 E0 0C 00 73 00 ]; - interrupts = <d 3 e 3 12 3>; - interrupt-parent = <40000>; - phy-handle = <2452000> - }; - - - - c) PHY nodes - - Required properties: - - - device_type : Should be "ethernet-phy" - - interrupts : <a b> where a is the interrupt number and b is a - field that represents an encoding of the sense and level - information for the interrupt. This should be encoded based on - the information in section 2) depending on the type of interrupt - controller you have. - - interrupt-parent : the phandle for the interrupt controller that - services interrupts for this device. - - reg : The ID number for the phy, usually a small integer - - linux,phandle : phandle for this node; likely referenced by an - ethernet controller node. - - - Example: - - ethernet-phy@0 { - linux,phandle = <2452000> - interrupt-parent = <40000>; - interrupts = <35 1>; - reg = <0>; - device_type = "ethernet-phy"; - }; - - - d) Interrupt controllers - - Some SOC devices contain interrupt controllers that are different - from the standard Open PIC specification. The SOC device nodes for - these types of controllers should be specified just like a standard - OpenPIC controller. Sense and level information should be encoded - as specified in section 2) of this chapter for each device that - specifies an interrupt. - - Example : - - pic@40000 { - linux,phandle = <40000>; - clock-frequency = <0>; - interrupt-controller; - #address-cells = <0>; - reg = <40000 40000>; - built-in; - compatible = "chrp,open-pic"; - device_type = "open-pic"; - big-endian; - }; - - - e) I2C - - Required properties : - - - device_type : Should be "i2c" - - reg : Offset and length of the register set for the device - - Recommended properties : - - - compatible : Should be "fsl-i2c" for parts compatible with - Freescale I2C specifications. - - interrupts : <a b> where a is the interrupt number and b is a - field that represents an encoding of the sense and level - information for the interrupt. This should be encoded based on - the information in section 2) depending on the type of interrupt - controller you have. - - interrupt-parent : the phandle for the interrupt controller that - services interrupts for this device. - - dfsrr : boolean; if defined, indicates that this I2C device has - a digital filter sampling rate register - - fsl5200-clocking : boolean; if defined, indicated that this device - uses the FSL 5200 clocking mechanism. - - Example : - - i2c@3000 { - interrupt-parent = <40000>; - interrupts = <1b 3>; - reg = <3000 18>; - device_type = "i2c"; - compatible = "fsl-i2c"; - dfsrr; - }; - - - f) Freescale SOC USB controllers - - The device node for a USB controller that is part of a Freescale - SOC is as described in the document "Open Firmware Recommended - Practice : Universal Serial Bus" with the following modifications - and additions : - - Required properties : - - compatible : Should be "fsl-usb2-mph" for multi port host USB - controllers, or "fsl-usb2-dr" for dual role USB controllers - - phy_type : For multi port host USB controllers, should be one of - "ulpi", or "serial". For dual role USB controllers, should be - one of "ulpi", "utmi", "utmi_wide", or "serial". - - reg : Offset and length of the register set for the device - - port0 : boolean; if defined, indicates port0 is connected for - fsl-usb2-mph compatible controllers. Either this property or - "port1" (or both) must be defined for "fsl-usb2-mph" compatible - controllers. - - port1 : boolean; if defined, indicates port1 is connected for - fsl-usb2-mph compatible controllers. Either this property or - "port0" (or both) must be defined for "fsl-usb2-mph" compatible - controllers. - - dr_mode : indicates the working mode for "fsl-usb2-dr" compatible - controllers. Can be "host", "peripheral", or "otg". Default to - "host" if not defined for backward compatibility. - - Recommended properties : - - interrupts : <a b> where a is the interrupt number and b is a - field that represents an encoding of the sense and level - information for the interrupt. This should be encoded based on - the information in section 2) depending on the type of interrupt - controller you have. - - interrupt-parent : the phandle for the interrupt controller that - services interrupts for this device. - - Example multi port host USB controller device node : - usb@22000 { - compatible = "fsl-usb2-mph"; - reg = <22000 1000>; - #address-cells = <1>; - #size-cells = <0>; - interrupt-parent = <700>; - interrupts = <27 1>; - phy_type = "ulpi"; - port0; - port1; - }; - - Example dual role USB controller device node : - usb@23000 { - compatible = "fsl-usb2-dr"; - reg = <23000 1000>; - #address-cells = <1>; - #size-cells = <0>; - interrupt-parent = <700>; - interrupts = <26 1>; - dr_mode = "otg"; - phy = "ulpi"; - }; - - - g) Freescale SOC SEC Security Engines - - Required properties: - - - device_type : Should be "crypto" - - model : Model of the device. Should be "SEC1" or "SEC2" - - compatible : Should be "talitos" - - reg : Offset and length of the register set for the device - - interrupts : <a b> where a is the interrupt number and b is a - field that represents an encoding of the sense and level - information for the interrupt. This should be encoded based on - the information in section 2) depending on the type of interrupt - controller you have. - - interrupt-parent : the phandle for the interrupt controller that - services interrupts for this device. - - num-channels : An integer representing the number of channels - available. - - channel-fifo-len : An integer representing the number of - descriptor pointers each channel fetch fifo can hold. - - exec-units-mask : The bitmask representing what execution units - (EUs) are available. It's a single 32-bit cell. EU information - should be encoded following the SEC's Descriptor Header Dword - EU_SEL0 field documentation, i.e. as follows: - - bit 0 = reserved - should be 0 - bit 1 = set if SEC has the ARC4 EU (AFEU) - bit 2 = set if SEC has the DES/3DES EU (DEU) - bit 3 = set if SEC has the message digest EU (MDEU) - bit 4 = set if SEC has the random number generator EU (RNG) - bit 5 = set if SEC has the public key EU (PKEU) - bit 6 = set if SEC has the AES EU (AESU) - bit 7 = set if SEC has the Kasumi EU (KEU) - - bits 8 through 31 are reserved for future SEC EUs. - - - descriptor-types-mask : The bitmask representing what descriptors - are available. It's a single 32-bit cell. Descriptor type - information should be encoded following the SEC's Descriptor - Header Dword DESC_TYPE field documentation, i.e. as follows: - - bit 0 = set if SEC supports the aesu_ctr_nonsnoop desc. type - bit 1 = set if SEC supports the ipsec_esp descriptor type - bit 2 = set if SEC supports the common_nonsnoop desc. type - bit 3 = set if SEC supports the 802.11i AES ccmp desc. type - bit 4 = set if SEC supports the hmac_snoop_no_afeu desc. type - bit 5 = set if SEC supports the srtp descriptor type - bit 6 = set if SEC supports the non_hmac_snoop_no_afeu desc.type - bit 7 = set if SEC supports the pkeu_assemble descriptor type - bit 8 = set if SEC supports the aesu_key_expand_output desc.type - bit 9 = set if SEC supports the pkeu_ptmul descriptor type - bit 10 = set if SEC supports the common_nonsnoop_afeu desc. type - bit 11 = set if SEC supports the pkeu_ptadd_dbl descriptor type - - ..and so on and so forth. - - Example: - - /* MPC8548E */ - crypto@30000 { - device_type = "crypto"; - model = "SEC2"; - compatible = "talitos"; - reg = <30000 10000>; - interrupts = <1d 3>; - interrupt-parent = <40000>; - num-channels = <4>; - channel-fifo-len = <18>; - exec-units-mask = <000000fe>; - descriptor-types-mask = <012b0ebf>; - }; - - h) Board Control and Status (BCSR) - - Required properties: - - - device_type : Should be "board-control" - - reg : Offset and length of the register set for the device - - Example: - - bcsr@f8000000 { - device_type = "board-control"; - reg = <f8000000 8000>; - }; - - i) Freescale QUICC Engine module (QE) - This represents qe module that is installed on PowerQUICC II Pro. - - NOTE: This is an interim binding; it should be updated to fit - in with the CPM binding later in this document. - - Basically, it is a bus of devices, that could act more or less - as a complete entity (UCC, USB etc ). All of them should be siblings on - the "root" qe node, using the common properties from there. - The description below applies to the qe of MPC8360 and - more nodes and properties would be extended in the future. - - i) Root QE device - - Required properties: - - compatible : should be "fsl,qe"; - - model : precise model of the QE, Can be "QE", "CPM", or "CPM2" - - reg : offset and length of the device registers. - - bus-frequency : the clock frequency for QUICC Engine. - - Recommended properties - - brg-frequency : the internal clock source frequency for baud-rate - generators in Hz. - - Example: - qe@e0100000 { - #address-cells = <1>; - #size-cells = <1>; - #interrupt-cells = <2>; - compatible = "fsl,qe"; - ranges = <0 e0100000 00100000>; - reg = <e0100000 480>; - brg-frequency = <0>; - bus-frequency = <179A7B00>; - } - - - ii) SPI (Serial Peripheral Interface) - - Required properties: - - cell-index : SPI controller index. - - compatible : should be "fsl,spi". - - mode : the SPI operation mode, it can be "cpu" or "cpu-qe". - - reg : Offset and length of the register set for the device - - interrupts : <a b> where a is the interrupt number and b is a - field that represents an encoding of the sense and level - information for the interrupt. This should be encoded based on - the information in section 2) depending on the type of interrupt - controller you have. - - interrupt-parent : the phandle for the interrupt controller that - services interrupts for this device. - - Example: - spi@4c0 { - cell-index = <0>; - compatible = "fsl,spi"; - reg = <4c0 40>; - interrupts = <82 0>; - interrupt-parent = <700>; - mode = "cpu"; - }; - - - iii) USB (Universal Serial Bus Controller) - - Required properties: - - compatible : could be "qe_udc" or "fhci-hcd". - - mode : the could be "host" or "slave". - - reg : Offset and length of the register set for the device - - interrupts : <a b> where a is the interrupt number and b is a - field that represents an encoding of the sense and level - information for the interrupt. This should be encoded based on - the information in section 2) depending on the type of interrupt - controller you have. - - interrupt-parent : the phandle for the interrupt controller that - services interrupts for this device. - - Example(slave): - usb@6c0 { - compatible = "qe_udc"; - reg = <6c0 40>; - interrupts = <8b 0>; - interrupt-parent = <700>; - mode = "slave"; - }; - - - iv) UCC (Unified Communications Controllers) - - Required properties: - - device_type : should be "network", "hldc", "uart", "transparent" - "bisync", "atm", or "serial". - - compatible : could be "ucc_geth" or "fsl_atm" and so on. - - model : should be "UCC". - - device-id : the ucc number(1-8), corresponding to UCCx in UM. - - reg : Offset and length of the register set for the device - - interrupts : <a b> where a is the interrupt number and b is a - field that represents an encoding of the sense and level - information for the interrupt. This should be encoded based on - the information in section 2) depending on the type of interrupt - controller you have. - - interrupt-parent : the phandle for the interrupt controller that - services interrupts for this device. - - pio-handle : The phandle for the Parallel I/O port configuration. - - port-number : for UART drivers, the port number to use, between 0 and 3. - This usually corresponds to the /dev/ttyQE device, e.g. <0> = /dev/ttyQE0. - The port number is added to the minor number of the device. Unlike the - CPM UART driver, the port-number is required for the QE UART driver. - - soft-uart : for UART drivers, if specified this means the QE UART device - driver should use "Soft-UART" mode, which is needed on some SOCs that have - broken UART hardware. Soft-UART is provided via a microcode upload. - - rx-clock-name: the UCC receive clock source - "none": clock source is disabled - "brg1" through "brg16": clock source is BRG1-BRG16, respectively - "clk1" through "clk24": clock source is CLK1-CLK24, respectively - - tx-clock-name: the UCC transmit clock source - "none": clock source is disabled - "brg1" through "brg16": clock source is BRG1-BRG16, respectively - "clk1" through "clk24": clock source is CLK1-CLK24, respectively - The following two properties are deprecated. rx-clock has been replaced - with rx-clock-name, and tx-clock has been replaced with tx-clock-name. - Drivers that currently use the deprecated properties should continue to - do so, in order to support older device trees, but they should be updated - to check for the new properties first. - - rx-clock : represents the UCC receive clock source. - 0x00 : clock source is disabled; - 0x1~0x10 : clock source is BRG1~BRG16 respectively; - 0x11~0x28: clock source is QE_CLK1~QE_CLK24 respectively. - - tx-clock: represents the UCC transmit clock source; - 0x00 : clock source is disabled; - 0x1~0x10 : clock source is BRG1~BRG16 respectively; - 0x11~0x28: clock source is QE_CLK1~QE_CLK24 respectively. - - Required properties for network device_type: - - mac-address : list of bytes representing the ethernet address. - - phy-handle : The phandle for the PHY connected to this controller. - - Recommended properties: - - linux,network-index : This is the intended "index" of this - network device. This is used by the bootwrapper to interpret - MAC addresses passed by the firmware when no information other - than indices is available to associate an address with a device. - - phy-connection-type : a string naming the controller/PHY interface type, - i.e., "mii" (default), "rmii", "gmii", "rgmii", "rgmii-id" (Internal - Delay), "rgmii-txid" (delay on TX only), "rgmii-rxid" (delay on RX only), - "tbi", or "rtbi". - - Example: - ucc@2000 { - device_type = "network"; - compatible = "ucc_geth"; - model = "UCC"; - device-id = <1>; - reg = <2000 200>; - interrupts = <a0 0>; - interrupt-parent = <700>; - mac-address = [ 00 04 9f 00 23 23 ]; - rx-clock = "none"; - tx-clock = "clk9"; - phy-handle = <212000>; - phy-connection-type = "gmii"; - pio-handle = <140001>; - }; - - - v) Parallel I/O Ports - - This node configures Parallel I/O ports for CPUs with QE support. - The node should reside in the "soc" node of the tree. For each - device that using parallel I/O ports, a child node should be created. - See the definition of the Pin configuration nodes below for more - information. - - Required properties: - - device_type : should be "par_io". - - reg : offset to the register set and its length. - - num-ports : number of Parallel I/O ports - - Example: - par_io@1400 { - reg = <1400 100>; - #address-cells = <1>; - #size-cells = <0>; - device_type = "par_io"; - num-ports = <7>; - ucc_pin@01 { - ...... - }; - - - vi) Pin configuration nodes - - Required properties: - - linux,phandle : phandle of this node; likely referenced by a QE - device. - - pio-map : array of pin configurations. Each pin is defined by 6 - integers. The six numbers are respectively: port, pin, dir, - open_drain, assignment, has_irq. - - port : port number of the pin; 0-6 represent port A-G in UM. - - pin : pin number in the port. - - dir : direction of the pin, should encode as follows: - - 0 = The pin is disabled - 1 = The pin is an output - 2 = The pin is an input - 3 = The pin is I/O - - - open_drain : indicates the pin is normal or wired-OR: - - 0 = The pin is actively driven as an output - 1 = The pin is an open-drain driver. As an output, the pin is - driven active-low, otherwise it is three-stated. - - - assignment : function number of the pin according to the Pin Assignment - tables in User Manual. Each pin can have up to 4 possible functions in - QE and two options for CPM. - - has_irq : indicates if the pin is used as source of external - interrupts. - - Example: - ucc_pin@01 { - linux,phandle = <140001>; - pio-map = < - /* port pin dir open_drain assignment has_irq */ - 0 3 1 0 1 0 /* TxD0 */ - 0 4 1 0 1 0 /* TxD1 */ - 0 5 1 0 1 0 /* TxD2 */ - 0 6 1 0 1 0 /* TxD3 */ - 1 6 1 0 3 0 /* TxD4 */ - 1 7 1 0 1 0 /* TxD5 */ - 1 9 1 0 2 0 /* TxD6 */ - 1 a 1 0 2 0 /* TxD7 */ - 0 9 2 0 1 0 /* RxD0 */ - 0 a 2 0 1 0 /* RxD1 */ - 0 b 2 0 1 0 /* RxD2 */ - 0 c 2 0 1 0 /* RxD3 */ - 0 d 2 0 1 0 /* RxD4 */ - 1 1 2 0 2 0 /* RxD5 */ - 1 0 2 0 2 0 /* RxD6 */ - 1 4 2 0 2 0 /* RxD7 */ - 0 7 1 0 1 0 /* TX_EN */ - 0 8 1 0 1 0 /* TX_ER */ - 0 f 2 0 1 0 /* RX_DV */ - 0 10 2 0 1 0 /* RX_ER */ - 0 0 2 0 1 0 /* RX_CLK */ - 2 9 1 0 3 0 /* GTX_CLK - CLK10 */ - 2 8 2 0 1 0>; /* GTX125 - CLK9 */ - }; - - vii) Multi-User RAM (MURAM) - - Required properties: - - compatible : should be "fsl,qe-muram", "fsl,cpm-muram". - - mode : the could be "host" or "slave". - - ranges : Should be defined as specified in 1) to describe the - translation of MURAM addresses. - - data-only : sub-node which defines the address area under MURAM - bus that can be allocated as data/parameter - - Example: - - muram@10000 { - compatible = "fsl,qe-muram", "fsl,cpm-muram"; - ranges = <0 00010000 0000c000>; - - data-only@0{ - compatible = "fsl,qe-muram-data", - "fsl,cpm-muram-data"; - reg = <0 c000>; - }; - }; - - viii) Uploaded QE firmware - - If a new firwmare has been uploaded to the QE (usually by the - boot loader), then a 'firmware' child node should be added to the QE - node. This node provides information on the uploaded firmware that - device drivers may need. - - Required properties: - - id: The string name of the firmware. This is taken from the 'id' - member of the qe_firmware structure of the uploaded firmware. - Device drivers can search this string to determine if the - firmware they want is already present. - - extended-modes: The Extended Modes bitfield, taken from the - firmware binary. It is a 64-bit number represented - as an array of two 32-bit numbers. - - virtual-traps: The virtual traps, taken from the firmware binary. - It is an array of 8 32-bit numbers. - - Example: - - firmware { - id = "Soft-UART"; - extended-modes = <0 0>; - virtual-traps = <0 0 0 0 0 0 0 0>; - } - - j) CFI or JEDEC memory-mapped NOR flash - - Flash chips (Memory Technology Devices) are often used for solid state - file systems on embedded devices. - - - compatible : should contain the specific model of flash chip(s) - used, if known, followed by either "cfi-flash" or "jedec-flash" - - reg : Address range of the flash chip - - bank-width : Width (in bytes) of the flash bank. Equal to the - device width times the number of interleaved chips. - - device-width : (optional) Width of a single flash chip. If - omitted, assumed to be equal to 'bank-width'. - - #address-cells, #size-cells : Must be present if the flash has - sub-nodes representing partitions (see below). In this case - both #address-cells and #size-cells must be equal to 1. - - For JEDEC compatible devices, the following additional properties - are defined: - - - vendor-id : Contains the flash chip's vendor id (1 byte). - - device-id : Contains the flash chip's device id (1 byte). - - In addition to the information on the flash bank itself, the - device tree may optionally contain additional information - describing partitions of the flash address space. This can be - used on platforms which have strong conventions about which - portions of the flash are used for what purposes, but which don't - use an on-flash partition table such as RedBoot. - - Each partition is represented as a sub-node of the flash device. - Each node's name represents the name of the corresponding - partition of the flash device. - - Flash partitions - - reg : The partition's offset and size within the flash bank. - - label : (optional) The label / name for this flash partition. - If omitted, the label is taken from the node name (excluding - the unit address). - - read-only : (optional) This parameter, if present, is a hint to - Linux that this flash partition should only be mounted - read-only. This is usually used for flash partitions - containing early-boot firmware images or data which should not - be clobbered. - - Example: - - flash@ff000000 { - compatible = "amd,am29lv128ml", "cfi-flash"; - reg = <ff000000 01000000>; - bank-width = <4>; - device-width = <1>; - #address-cells = <1>; - #size-cells = <1>; - fs@0 { - label = "fs"; - reg = <0 f80000>; - }; - firmware@f80000 { - label ="firmware"; - reg = <f80000 80000>; - read-only; - }; - }; - - k) Global Utilities Block - - The global utilities block controls power management, I/O device - enabling, power-on-reset configuration monitoring, general-purpose - I/O signal configuration, alternate function selection for multiplexed - signals, and clock control. - - Required properties: - - - compatible : Should define the compatible device type for - global-utilities. - - reg : Offset and length of the register set for the device. - - Recommended properties: - - - fsl,has-rstcr : Indicates that the global utilities register set - contains a functioning "reset control register" (i.e. the board - is wired to reset upon setting the HRESET_REQ bit in this register). - - Example: - - global-utilities@e0000 { /* global utilities block */ - compatible = "fsl,mpc8548-guts"; - reg = <e0000 1000>; - fsl,has-rstcr; - }; - - l) Freescale Communications Processor Module - - NOTE: This is an interim binding, and will likely change slightly, - as more devices are supported. The QE bindings especially are - incomplete. - - i) Root CPM node - - Properties: - - compatible : "fsl,cpm1", "fsl,cpm2", or "fsl,qe". - - reg : A 48-byte region beginning with CPCR. - - Example: - cpm@119c0 { - #address-cells = <1>; - #size-cells = <1>; - #interrupt-cells = <2>; - compatible = "fsl,mpc8272-cpm", "fsl,cpm2"; - reg = <119c0 30>; - } - - ii) Properties common to mulitple CPM/QE devices - - - fsl,cpm-command : This value is ORed with the opcode and command flag - to specify the device on which a CPM command operates. - - - fsl,cpm-brg : Indicates which baud rate generator the device - is associated with. If absent, an unused BRG - should be dynamically allocated. If zero, the - device uses an external clock rather than a BRG. - - - reg : Unless otherwise specified, the first resource represents the - scc/fcc/ucc registers, and the second represents the device's - parameter RAM region (if it has one). - - iii) Serial - - Currently defined compatibles: - - fsl,cpm1-smc-uart - - fsl,cpm2-smc-uart - - fsl,cpm1-scc-uart - - fsl,cpm2-scc-uart - - fsl,qe-uart - - Example: - - serial@11a00 { - device_type = "serial"; - compatible = "fsl,mpc8272-scc-uart", - "fsl,cpm2-scc-uart"; - reg = <11a00 20 8000 100>; - interrupts = <28 8>; - interrupt-parent = <&PIC>; - fsl,cpm-brg = <1>; - fsl,cpm-command = <00800000>; - }; - - iii) Network - - Currently defined compatibles: - - fsl,cpm1-scc-enet - - fsl,cpm2-scc-enet - - fsl,cpm1-fec-enet - - fsl,cpm2-fcc-enet (third resource is GFEMR) - - fsl,qe-enet - - Example: - - ethernet@11300 { - device_type = "network"; - compatible = "fsl,mpc8272-fcc-enet", - "fsl,cpm2-fcc-enet"; - reg = <11300 20 8400 100 11390 1>; - local-mac-address = [ 00 00 00 00 00 00 ]; - interrupts = <20 8>; - interrupt-parent = <&PIC>; - phy-handle = <&PHY0>; - linux,network-index = <0>; - fsl,cpm-command = <12000300>; - }; - - iv) MDIO - - Currently defined compatibles: - fsl,pq1-fec-mdio (reg is same as first resource of FEC device) - fsl,cpm2-mdio-bitbang (reg is port C registers) - - Properties for fsl,cpm2-mdio-bitbang: - fsl,mdio-pin : pin of port C controlling mdio data - fsl,mdc-pin : pin of port C controlling mdio clock - - Example: - - mdio@10d40 { - device_type = "mdio"; - compatible = "fsl,mpc8272ads-mdio-bitbang", - "fsl,mpc8272-mdio-bitbang", - "fsl,cpm2-mdio-bitbang"; - reg = <10d40 14>; - #address-cells = <1>; - #size-cells = <0>; - fsl,mdio-pin = <12>; - fsl,mdc-pin = <13>; - }; - - v) Baud Rate Generators - - Currently defined compatibles: - fsl,cpm-brg - fsl,cpm1-brg - fsl,cpm2-brg - - Properties: - - reg : There may be an arbitrary number of reg resources; BRG - numbers are assigned to these in order. - - clock-frequency : Specifies the base frequency driving - the BRG. - - Example: - - brg@119f0 { - compatible = "fsl,mpc8272-brg", - "fsl,cpm2-brg", - "fsl,cpm-brg"; - reg = <119f0 10 115f0 10>; - clock-frequency = <d#25000000>; - }; - - vi) Interrupt Controllers - - Currently defined compatibles: - - fsl,cpm1-pic - - only one interrupt cell - - fsl,pq1-pic - - fsl,cpm2-pic - - second interrupt cell is level/sense: - - 2 is falling edge - - 8 is active low - - Example: - - interrupt-controller@10c00 { - #interrupt-cells = <2>; - interrupt-controller; - reg = <10c00 80>; - compatible = "mpc8272-pic", "fsl,cpm2-pic"; - }; - - vii) USB (Universal Serial Bus Controller) - - Properties: - - compatible : "fsl,cpm1-usb", "fsl,cpm2-usb", "fsl,qe-usb" - - Example: - usb@11bc0 { - #address-cells = <1>; - #size-cells = <0>; - compatible = "fsl,cpm2-usb"; - reg = <11b60 18 8b00 100>; - interrupts = <b 8>; - interrupt-parent = <&PIC>; - fsl,cpm-command = <2e600000>; - }; - - viii) Multi-User RAM (MURAM) - - The multi-user/dual-ported RAM is expressed as a bus under the CPM node. - - Ranges must be set up subject to the following restrictions: - - - Children's reg nodes must be offsets from the start of all muram, even - if the user-data area does not begin at zero. - - If multiple range entries are used, the difference between the parent - address and the child address must be the same in all, so that a single - mapping can cover them all while maintaining the ability to determine - CPM-side offsets with pointer subtraction. It is recommended that - multiple range entries not be used. - - A child address of zero must be translatable, even if no reg resources - contain it. - - A child "data" node must exist, compatible with "fsl,cpm-muram-data", to - indicate the portion of muram that is usable by the OS for arbitrary - purposes. The data node may have an arbitrary number of reg resources, - all of which contribute to the allocatable muram pool. - - Example, based on mpc8272: - - muram@0 { - #address-cells = <1>; - #size-cells = <1>; - ranges = <0 0 10000>; - - data@0 { - compatible = "fsl,cpm-muram-data"; - reg = <0 2000 9800 800>; - }; - }; - - m) Chipselect/Local Bus - - Properties: - - name : Should be localbus - - #address-cells : Should be either two or three. The first cell is the - chipselect number, and the remaining cells are the - offset into the chipselect. - - #size-cells : Either one or two, depending on how large each chipselect - can be. - - ranges : Each range corresponds to a single chipselect, and cover - the entire access window as configured. - - Example: - localbus@f0010100 { - compatible = "fsl,mpc8272-localbus", - "fsl,pq2-localbus"; - #address-cells = <2>; - #size-cells = <1>; - reg = <f0010100 40>; - - ranges = <0 0 fe000000 02000000 - 1 0 f4500000 00008000>; - - flash@0,0 { - compatible = "jedec-flash"; - reg = <0 0 2000000>; - bank-width = <4>; - device-width = <1>; - }; - - board-control@1,0 { - reg = <1 0 20>; - compatible = "fsl,mpc8272ads-bcsr"; - }; - }; - - - n) 4xx/Axon EMAC ethernet nodes - - The EMAC ethernet controller in IBM and AMCC 4xx chips, and also - the Axon bridge. To operate this needs to interact with a ths - special McMAL DMA controller, and sometimes an RGMII or ZMII - interface. In addition to the nodes and properties described - below, the node for the OPB bus on which the EMAC sits must have a - correct clock-frequency property. - - i) The EMAC node itself - - Required properties: - - device_type : "network" - - - compatible : compatible list, contains 2 entries, first is - "ibm,emac-CHIP" where CHIP is the host ASIC (440gx, - 405gp, Axon) and second is either "ibm,emac" or - "ibm,emac4". For Axon, thus, we have: "ibm,emac-axon", - "ibm,emac4" - - interrupts : <interrupt mapping for EMAC IRQ and WOL IRQ> - - interrupt-parent : optional, if needed for interrupt mapping - - reg : <registers mapping> - - local-mac-address : 6 bytes, MAC address - - mal-device : phandle of the associated McMAL node - - mal-tx-channel : 1 cell, index of the tx channel on McMAL associated - with this EMAC - - mal-rx-channel : 1 cell, index of the rx channel on McMAL associated - with this EMAC - - cell-index : 1 cell, hardware index of the EMAC cell on a given - ASIC (typically 0x0 and 0x1 for EMAC0 and EMAC1 on - each Axon chip) - - max-frame-size : 1 cell, maximum frame size supported in bytes - - rx-fifo-size : 1 cell, Rx fifo size in bytes for 10 and 100 Mb/sec - operations. - For Axon, 2048 - - tx-fifo-size : 1 cell, Tx fifo size in bytes for 10 and 100 Mb/sec - operations. - For Axon, 2048. - - fifo-entry-size : 1 cell, size of a fifo entry (used to calculate - thresholds). - For Axon, 0x00000010 - - mal-burst-size : 1 cell, MAL burst size (used to calculate thresholds) - in bytes. - For Axon, 0x00000100 (I think ...) - - phy-mode : string, mode of operations of the PHY interface. - Supported values are: "mii", "rmii", "smii", "rgmii", - "tbi", "gmii", rtbi", "sgmii". - For Axon on CAB, it is "rgmii" - - mdio-device : 1 cell, required iff using shared MDIO registers - (440EP). phandle of the EMAC to use to drive the - MDIO lines for the PHY used by this EMAC. - - zmii-device : 1 cell, required iff connected to a ZMII. phandle of - the ZMII device node - - zmii-channel : 1 cell, required iff connected to a ZMII. Which ZMII - channel or 0xffffffff if ZMII is only used for MDIO. - - rgmii-device : 1 cell, required iff connected to an RGMII. phandle - of the RGMII device node. - For Axon: phandle of plb5/plb4/opb/rgmii - - rgmii-channel : 1 cell, required iff connected to an RGMII. Which - RGMII channel is used by this EMAC. - Fox Axon: present, whatever value is appropriate for each - EMAC, that is the content of the current (bogus) "phy-port" - property. - - Recommended properties: - - linux,network-index : This is the intended "index" of this - network device. This is used by the bootwrapper to interpret - MAC addresses passed by the firmware when no information other - than indices is available to associate an address with a device. - - Optional properties: - - phy-address : 1 cell, optional, MDIO address of the PHY. If absent, - a search is performed. - - phy-map : 1 cell, optional, bitmap of addresses to probe the PHY - for, used if phy-address is absent. bit 0x00000001 is - MDIO address 0. - For Axon it can be absent, thouugh my current driver - doesn't handle phy-address yet so for now, keep - 0x00ffffff in it. - - rx-fifo-size-gige : 1 cell, Rx fifo size in bytes for 1000 Mb/sec - operations (if absent the value is the same as - rx-fifo-size). For Axon, either absent or 2048. - - tx-fifo-size-gige : 1 cell, Tx fifo size in bytes for 1000 Mb/sec - operations (if absent the value is the same as - tx-fifo-size). For Axon, either absent or 2048. - - tah-device : 1 cell, optional. If connected to a TAH engine for - offload, phandle of the TAH device node. - - tah-channel : 1 cell, optional. If appropriate, channel used on the - TAH engine. - - Example: - - EMAC0: ethernet@40000800 { - linux,network-index = <0>; - device_type = "network"; - compatible = "ibm,emac-440gp", "ibm,emac"; - interrupt-parent = <&UIC1>; - interrupts = <1c 4 1d 4>; - reg = <40000800 70>; - local-mac-address = [00 04 AC E3 1B 1E]; - mal-device = <&MAL0>; - mal-tx-channel = <0 1>; - mal-rx-channel = <0>; - cell-index = <0>; - max-frame-size = <5dc>; - rx-fifo-size = <1000>; - tx-fifo-size = <800>; - phy-mode = "rmii"; - phy-map = <00000001>; - zmii-device = <&ZMII0>; - zmii-channel = <0>; - }; - - ii) McMAL node - - Required properties: - - device_type : "dma-controller" - - compatible : compatible list, containing 2 entries, first is - "ibm,mcmal-CHIP" where CHIP is the host ASIC (like - emac) and the second is either "ibm,mcmal" or - "ibm,mcmal2". - For Axon, "ibm,mcmal-axon","ibm,mcmal2" - - interrupts : <interrupt mapping for the MAL interrupts sources: - 5 sources: tx_eob, rx_eob, serr, txde, rxde>. - For Axon: This is _different_ from the current - firmware. We use the "delayed" interrupts for txeob - and rxeob. Thus we end up with mapping those 5 MPIC - interrupts, all level positive sensitive: 10, 11, 32, - 33, 34 (in decimal) - - dcr-reg : < DCR registers range > - - dcr-parent : if needed for dcr-reg - - num-tx-chans : 1 cell, number of Tx channels - - num-rx-chans : 1 cell, number of Rx channels - - iii) ZMII node - - Required properties: - - compatible : compatible list, containing 2 entries, first is - "ibm,zmii-CHIP" where CHIP is the host ASIC (like - EMAC) and the second is "ibm,zmii". - For Axon, there is no ZMII node. - - reg : <registers mapping> - - iv) RGMII node - - Required properties: - - compatible : compatible list, containing 2 entries, first is - "ibm,rgmii-CHIP" where CHIP is the host ASIC (like - EMAC) and the second is "ibm,rgmii". - For Axon, "ibm,rgmii-axon","ibm,rgmii" - - reg : <registers mapping> - - revision : as provided by the RGMII new version register if - available. - For Axon: 0x0000012a - - o) Xilinx IP cores - - The Xilinx EDK toolchain ships with a set of IP cores (devices) for use - in Xilinx Spartan and Virtex FPGAs. The devices cover the whole range - of standard device types (network, serial, etc.) and miscellanious - devices (gpio, LCD, spi, etc). Also, since these devices are - implemented within the fpga fabric every instance of the device can be - synthesised with different options that change the behaviour. - - Each IP-core has a set of parameters which the FPGA designer can use to - control how the core is synthesized. Historically, the EDK tool would - extract the device parameters relevant to device drivers and copy them - into an 'xparameters.h' in the form of #define symbols. This tells the - device drivers how the IP cores are configured, but it requres the kernel - to be recompiled every time the FPGA bitstream is resynthesized. - - The new approach is to export the parameters into the device tree and - generate a new device tree each time the FPGA bitstream changes. The - parameters which used to be exported as #defines will now become - properties of the device node. In general, device nodes for IP-cores - will take the following form: - - (name): (generic-name)@(base-address) { - compatible = "xlnx,(ip-core-name)-(HW_VER)" - [, (list of compatible devices), ...]; - reg = <(baseaddr) (size)>; - interrupt-parent = <&interrupt-controller-phandle>; - interrupts = < ... >; - xlnx,(parameter1) = "(string-value)"; - xlnx,(parameter2) = <(int-value)>; - }; - - (generic-name): an open firmware-style name that describes the - generic class of device. Preferably, this is one word, such - as 'serial' or 'ethernet'. - (ip-core-name): the name of the ip block (given after the BEGIN - directive in system.mhs). Should be in lowercase - and all underscores '_' converted to dashes '-'. - (name): is derived from the "PARAMETER INSTANCE" value. - (parameter#): C_* parameters from system.mhs. The C_ prefix is - dropped from the parameter name, the name is converted - to lowercase and all underscore '_' characters are - converted to dashes '-'. - (baseaddr): the baseaddr parameter value (often named C_BASEADDR). - (HW_VER): from the HW_VER parameter. - (size): the address range size (often C_HIGHADDR - C_BASEADDR + 1). - - Typically, the compatible list will include the exact IP core version - followed by an older IP core version which implements the same - interface or any other device with the same interface. - - 'reg', 'interrupt-parent' and 'interrupts' are all optional properties. - - For example, the following block from system.mhs: - - BEGIN opb_uartlite - PARAMETER INSTANCE = opb_uartlite_0 - PARAMETER HW_VER = 1.00.b - PARAMETER C_BAUDRATE = 115200 - PARAMETER C_DATA_BITS = 8 - PARAMETER C_ODD_PARITY = 0 - PARAMETER C_USE_PARITY = 0 - PARAMETER C_CLK_FREQ = 50000000 - PARAMETER C_BASEADDR = 0xEC100000 - PARAMETER C_HIGHADDR = 0xEC10FFFF - BUS_INTERFACE SOPB = opb_7 - PORT OPB_Clk = CLK_50MHz - PORT Interrupt = opb_uartlite_0_Interrupt - PORT RX = opb_uartlite_0_RX - PORT TX = opb_uartlite_0_TX - PORT OPB_Rst = sys_bus_reset_0 - END - - becomes the following device tree node: - - opb_uartlite_0: serial@ec100000 { - device_type = "serial"; - compatible = "xlnx,opb-uartlite-1.00.b"; - reg = <ec100000 10000>; - interrupt-parent = <&opb_intc_0>; - interrupts = <1 0>; // got this from the opb_intc parameters - current-speed = <d#115200>; // standard serial device prop - clock-frequency = <d#50000000>; // standard serial device prop - xlnx,data-bits = <8>; - xlnx,odd-parity = <0>; - xlnx,use-parity = <0>; - }; - - Some IP cores actually implement 2 or more logical devices. In - this case, the device should still describe the whole IP core with - a single node and add a child node for each logical device. The - ranges property can be used to translate from parent IP-core to the - registers of each device. In addition, the parent node should be - compatible with the bus type 'xlnx,compound', and should contain - #address-cells and #size-cells, as with any other bus. (Note: this - makes the assumption that both logical devices have the same bus - binding. If this is not true, then separate nodes should be used - for each logical device). The 'cell-index' property can be used to - enumerate logical devices within an IP core. For example, the - following is the system.mhs entry for the dual ps2 controller found - on the ml403 reference design. - - BEGIN opb_ps2_dual_ref - PARAMETER INSTANCE = opb_ps2_dual_ref_0 - PARAMETER HW_VER = 1.00.a - PARAMETER C_BASEADDR = 0xA9000000 - PARAMETER C_HIGHADDR = 0xA9001FFF - BUS_INTERFACE SOPB = opb_v20_0 - PORT Sys_Intr1 = ps2_1_intr - PORT Sys_Intr2 = ps2_2_intr - PORT Clkin1 = ps2_clk_rx_1 - PORT Clkin2 = ps2_clk_rx_2 - PORT Clkpd1 = ps2_clk_tx_1 - PORT Clkpd2 = ps2_clk_tx_2 - PORT Rx1 = ps2_d_rx_1 - PORT Rx2 = ps2_d_rx_2 - PORT Txpd1 = ps2_d_tx_1 - PORT Txpd2 = ps2_d_tx_2 - END - - It would result in the following device tree nodes: - - opb_ps2_dual_ref_0: opb-ps2-dual-ref@a9000000 { - #address-cells = <1>; - #size-cells = <1>; - compatible = "xlnx,compound"; - ranges = <0 a9000000 2000>; - // If this device had extra parameters, then they would - // go here. - ps2@0 { - compatible = "xlnx,opb-ps2-dual-ref-1.00.a"; - reg = <0 40>; - interrupt-parent = <&opb_intc_0>; - interrupts = <3 0>; - cell-index = <0>; - }; - ps2@1000 { - compatible = "xlnx,opb-ps2-dual-ref-1.00.a"; - reg = <1000 40>; - interrupt-parent = <&opb_intc_0>; - interrupts = <3 0>; - cell-index = <0>; - }; - }; - - Also, the system.mhs file defines bus attachments from the processor - to the devices. The device tree structure should reflect the bus - attachments. Again an example; this system.mhs fragment: - - BEGIN ppc405_virtex4 - PARAMETER INSTANCE = ppc405_0 - PARAMETER HW_VER = 1.01.a - BUS_INTERFACE DPLB = plb_v34_0 - BUS_INTERFACE IPLB = plb_v34_0 - END - - BEGIN opb_intc - PARAMETER INSTANCE = opb_intc_0 - PARAMETER HW_VER = 1.00.c - PARAMETER C_BASEADDR = 0xD1000FC0 - PARAMETER C_HIGHADDR = 0xD1000FDF - BUS_INTERFACE SOPB = opb_v20_0 - END - - BEGIN opb_uart16550 - PARAMETER INSTANCE = opb_uart16550_0 - PARAMETER HW_VER = 1.00.d - PARAMETER C_BASEADDR = 0xa0000000 - PARAMETER C_HIGHADDR = 0xa0001FFF - BUS_INTERFACE SOPB = opb_v20_0 - END - - BEGIN plb_v34 - PARAMETER INSTANCE = plb_v34_0 - PARAMETER HW_VER = 1.02.a - END - - BEGIN plb_bram_if_cntlr - PARAMETER INSTANCE = plb_bram_if_cntlr_0 - PARAMETER HW_VER = 1.00.b - PARAMETER C_BASEADDR = 0xFFFF0000 - PARAMETER C_HIGHADDR = 0xFFFFFFFF - BUS_INTERFACE SPLB = plb_v34_0 - END - - BEGIN plb2opb_bridge - PARAMETER INSTANCE = plb2opb_bridge_0 - PARAMETER HW_VER = 1.01.a - PARAMETER C_RNG0_BASEADDR = 0x20000000 - PARAMETER C_RNG0_HIGHADDR = 0x3FFFFFFF - PARAMETER C_RNG1_BASEADDR = 0x60000000 - PARAMETER C_RNG1_HIGHADDR = 0x7FFFFFFF - PARAMETER C_RNG2_BASEADDR = 0x80000000 - PARAMETER C_RNG2_HIGHADDR = 0xBFFFFFFF - PARAMETER C_RNG3_BASEADDR = 0xC0000000 - PARAMETER C_RNG3_HIGHADDR = 0xDFFFFFFF - BUS_INTERFACE SPLB = plb_v34_0 - BUS_INTERFACE MOPB = opb_v20_0 - END - - Gives this device tree (some properties removed for clarity): - - plb@0 { - #address-cells = <1>; - #size-cells = <1>; - compatible = "xlnx,plb-v34-1.02.a"; - device_type = "ibm,plb"; - ranges; // 1:1 translation - - plb_bram_if_cntrl_0: bram@ffff0000 { - reg = <ffff0000 10000>; - } - - opb@20000000 { - #address-cells = <1>; - #size-cells = <1>; - ranges = <20000000 20000000 20000000 - 60000000 60000000 20000000 - 80000000 80000000 40000000 - c0000000 c0000000 20000000>; - - opb_uart16550_0: serial@a0000000 { - reg = <a00000000 2000>; - }; - - opb_intc_0: interrupt-controller@d1000fc0 { - reg = <d1000fc0 20>; - }; - }; - }; - - That covers the general approach to binding xilinx IP cores into the - device tree. The following are bindings for specific devices: - - i) Xilinx ML300 Framebuffer - - Simple framebuffer device from the ML300 reference design (also on the - ML403 reference design as well as others). - - Optional properties: - - resolution = <xres yres> : pixel resolution of framebuffer. Some - implementations use a different resolution. - Default is <d#640 d#480> - - virt-resolution = <xvirt yvirt> : Size of framebuffer in memory. - Default is <d#1024 d#480>. - - rotate-display (empty) : rotate display 180 degrees. - - ii) Xilinx SystemACE - - The Xilinx SystemACE device is used to program FPGAs from an FPGA - bitstream stored on a CF card. It can also be used as a generic CF - interface device. - - Optional properties: - - 8-bit (empty) : Set this property for SystemACE in 8 bit mode - - iii) Xilinx EMAC and Xilinx TEMAC - - Xilinx Ethernet devices. In addition to general xilinx properties - listed above, nodes for these devices should include a phy-handle - property, and may include other common network device properties - like local-mac-address. - - iv) Xilinx Uartlite - - Xilinx uartlite devices are simple fixed speed serial ports. - - Requred properties: - - current-speed : Baud rate of uartlite - - v) Xilinx hwicap - - Xilinx hwicap devices provide access to the configuration logic - of the FPGA through the Internal Configuration Access Port - (ICAP). The ICAP enables partial reconfiguration of the FPGA, - readback of the configuration information, and some control over - 'warm boots' of the FPGA fabric. - - Required properties: - - xlnx,family : The family of the FPGA, necessary since the - capabilities of the underlying ICAP hardware - differ between different families. May be - 'virtex2p', 'virtex4', or 'virtex5'. - - p) Freescale Synchronous Serial Interface - - The SSI is a serial device that communicates with audio codecs. It can - be programmed in AC97, I2S, left-justified, or right-justified modes. - - Required properties: - - compatible : compatible list, containing "fsl,ssi" - - cell-index : the SSI, <0> = SSI1, <1> = SSI2, and so on - - reg : offset and length of the register set for the device - - interrupts : <a b> where a is the interrupt number and b is a - field that represents an encoding of the sense and - level information for the interrupt. This should be - encoded based on the information in section 2) - depending on the type of interrupt controller you - have. - - interrupt-parent : the phandle for the interrupt controller that - services interrupts for this device. - - fsl,mode : the operating mode for the SSI interface - "i2s-slave" - I2S mode, SSI is clock slave - "i2s-master" - I2S mode, SSI is clock master - "lj-slave" - left-justified mode, SSI is clock slave - "lj-master" - l.j. mode, SSI is clock master - "rj-slave" - right-justified mode, SSI is clock slave - "rj-master" - r.j., SSI is clock master - "ac97-slave" - AC97 mode, SSI is clock slave - "ac97-master" - AC97 mode, SSI is clock master - - Optional properties: - - codec-handle : phandle to a 'codec' node that defines an audio - codec connected to this SSI. This node is typically - a child of an I2C or other control node. - - Child 'codec' node required properties: - - compatible : compatible list, contains the name of the codec - - Child 'codec' node optional properties: - - clock-frequency : The frequency of the input clock, which typically - comes from an on-board dedicated oscillator. - - * Freescale 83xx DMA Controller - - Freescale PowerPC 83xx have on chip general purpose DMA controllers. - - Required properties: - - - compatible : compatible list, contains 2 entries, first is - "fsl,CHIP-dma", where CHIP is the processor - (mpc8349, mpc8360, etc.) and the second is - "fsl,elo-dma" - - reg : <registers mapping for DMA general status reg> - - ranges : Should be defined as specified in 1) to describe the - DMA controller channels. - - cell-index : controller index. 0 for controller @ 0x8100 - - interrupts : <interrupt mapping for DMA IRQ> - - interrupt-parent : optional, if needed for interrupt mapping - - - - DMA channel nodes: - - compatible : compatible list, contains 2 entries, first is - "fsl,CHIP-dma-channel", where CHIP is the processor - (mpc8349, mpc8350, etc.) and the second is - "fsl,elo-dma-channel" - - reg : <registers mapping for channel> - - cell-index : dma channel index starts at 0. - - Optional properties: - - interrupts : <interrupt mapping for DMA channel IRQ> - (on 83xx this is expected to be identical to - the interrupts property of the parent node) - - interrupt-parent : optional, if needed for interrupt mapping - - Example: - dma@82a8 { - #address-cells = <1>; - #size-cells = <1>; - compatible = "fsl,mpc8349-dma", "fsl,elo-dma"; - reg = <82a8 4>; - ranges = <0 8100 1a4>; - interrupt-parent = <&ipic>; - interrupts = <47 8>; - cell-index = <0>; - dma-channel@0 { - compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; - cell-index = <0>; - reg = <0 80>; - }; - dma-channel@80 { - compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; - cell-index = <1>; - reg = <80 80>; - }; - dma-channel@100 { - compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; - cell-index = <2>; - reg = <100 80>; - }; - dma-channel@180 { - compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; - cell-index = <3>; - reg = <180 80>; - }; - }; - - * Freescale 85xx/86xx DMA Controller - - Freescale PowerPC 85xx/86xx have on chip general purpose DMA controllers. - - Required properties: - - - compatible : compatible list, contains 2 entries, first is - "fsl,CHIP-dma", where CHIP is the processor - (mpc8540, mpc8540, etc.) and the second is - "fsl,eloplus-dma" - - reg : <registers mapping for DMA general status reg> - - cell-index : controller index. 0 for controller @ 0x21000, - 1 for controller @ 0xc000 - - ranges : Should be defined as specified in 1) to describe the - DMA controller channels. - - - DMA channel nodes: - - compatible : compatible list, contains 2 entries, first is - "fsl,CHIP-dma-channel", where CHIP is the processor - (mpc8540, mpc8560, etc.) and the second is - "fsl,eloplus-dma-channel" - - cell-index : dma channel index starts at 0. - - reg : <registers mapping for channel> - - interrupts : <interrupt mapping for DMA channel IRQ> - - interrupt-parent : optional, if needed for interrupt mapping - - Example: - dma@21300 { - #address-cells = <1>; - #size-cells = <1>; - compatible = "fsl,mpc8540-dma", "fsl,eloplus-dma"; - reg = <21300 4>; - ranges = <0 21100 200>; - cell-index = <0>; - dma-channel@0 { - compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel"; - reg = <0 80>; - cell-index = <0>; - interrupt-parent = <&mpic>; - interrupts = <14 2>; - }; - dma-channel@80 { - compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel"; - reg = <80 80>; - cell-index = <1>; - interrupt-parent = <&mpic>; - interrupts = <15 2>; - }; - dma-channel@100 { - compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel"; - reg = <100 80>; - cell-index = <2>; - interrupt-parent = <&mpic>; - interrupts = <16 2>; - }; - dma-channel@180 { - compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel"; - reg = <180 80>; - cell-index = <3>; - interrupt-parent = <&mpic>; - interrupts = <17 2>; - }; - }; - - * Freescale 8xxx/3.0 Gb/s SATA nodes - - SATA nodes are defined to describe on-chip Serial ATA controllers. - Each SATA port should have its own node. - - Required properties: - - compatible : compatible list, contains 2 entries, first is - "fsl,CHIP-sata", where CHIP is the processor - (mpc8315, mpc8379, etc.) and the second is - "fsl,pq-sata" - - interrupts : <interrupt mapping for SATA IRQ> - - cell-index : controller index. - 1 for controller @ 0x18000 - 2 for controller @ 0x19000 - 3 for controller @ 0x1a000 - 4 for controller @ 0x1b000 - - Optional properties: - - interrupt-parent : optional, if needed for interrupt mapping - - reg : <registers mapping> - - Example: - - sata@18000 { - compatible = "fsl,mpc8379-sata", "fsl,pq-sata"; - reg = <0x18000 0x1000>; - cell-index = <1>; - interrupts = <2c 8>; - interrupt-parent = < &ipic >; - }; - - q) USB EHCI controllers - - Required properties: - - compatible : should be "usb-ehci". - - reg : should contain at least address and length of the standard EHCI - register set for the device. Optional platform-dependent registers - (debug-port or other) can be also specified here, but only after - definition of standard EHCI registers. - - interrupts : one EHCI interrupt should be described here. - If device registers are implemented in big endian mode, the device - node should have "big-endian-regs" property. - If controller implementation operates with big endian descriptors, - "big-endian-desc" property should be specified. - If both big endian registers and descriptors are used by the controller - implementation, "big-endian" property can be specified instead of having - both "big-endian-regs" and "big-endian-desc". - - Example (Sequoia 440EPx): - ehci@e0000300 { - compatible = "ibm,usb-ehci-440epx", "usb-ehci"; - interrupt-parent = <&UIC0>; - interrupts = <1a 4>; - reg = <0 e0000300 90 0 e0000390 70>; - big-endian; - }; - - - More devices will be defined as this spec matures. - -VII - Specifying interrupt information for devices -=================================================== - -The device tree represents the busses and devices of a hardware -system in a form similar to the physical bus topology of the -hardware. - -In addition, a logical 'interrupt tree' exists which represents the -hierarchy and routing of interrupts in the hardware. - -The interrupt tree model is fully described in the -document "Open Firmware Recommended Practice: Interrupt -Mapping Version 0.9". The document is available at: -<http://playground.sun.com/1275/practice>. - -1) interrupts property ----------------------- - -Devices that generate interrupts to a single interrupt controller -should use the conventional OF representation described in the -OF interrupt mapping documentation. - -Each device which generates interrupts must have an 'interrupt' -property. The interrupt property value is an arbitrary number of -of 'interrupt specifier' values which describe the interrupt or -interrupts for the device. - -The encoding of an interrupt specifier is determined by the -interrupt domain in which the device is located in the -interrupt tree. The root of an interrupt domain specifies in -its #interrupt-cells property the number of 32-bit cells -required to encode an interrupt specifier. See the OF interrupt -mapping documentation for a detailed description of domains. - -For example, the binding for the OpenPIC interrupt controller -specifies an #interrupt-cells value of 2 to encode the interrupt -number and level/sense information. All interrupt children in an -OpenPIC interrupt domain use 2 cells per interrupt in their interrupts -property. - -The PCI bus binding specifies a #interrupt-cell value of 1 to encode -which interrupt pin (INTA,INTB,INTC,INTD) is used. - -2) interrupt-parent property ----------------------------- - -The interrupt-parent property is specified to define an explicit -link between a device node and its interrupt parent in -the interrupt tree. The value of interrupt-parent is the -phandle of the parent node. - -If the interrupt-parent property is not defined for a node, it's -interrupt parent is assumed to be an ancestor in the node's -_device tree_ hierarchy. - -3) OpenPIC Interrupt Controllers --------------------------------- - -OpenPIC interrupt controllers require 2 cells to encode -interrupt information. The first cell defines the interrupt -number. The second cell defines the sense and level -information. - -Sense and level information should be encoded as follows: - - 0 = low to high edge sensitive type enabled - 1 = active low level sensitive type enabled - 2 = active high level sensitive type enabled - 3 = high to low edge sensitive type enabled - -4) ISA Interrupt Controllers ----------------------------- - -ISA PIC interrupt controllers require 2 cells to encode -interrupt information. The first cell defines the interrupt -number. The second cell defines the sense and level -information. - -ISA PIC interrupt controllers should adhere to the ISA PIC -encodings listed below: - - 0 = active low level sensitive type enabled - 1 = active high level sensitive type enabled - 2 = high to low edge sensitive type enabled - 3 = low to high edge sensitive type enabled - - -Appendix A - Sample SOC node for MPC8540 -======================================== - -Note that the #address-cells and #size-cells for the SoC node -in this example have been explicitly listed; these are likely -not necessary as they are usually the same as the root node. - - soc8540@e0000000 { - #address-cells = <1>; - #size-cells = <1>; - #interrupt-cells = <2>; - device_type = "soc"; - ranges = <00000000 e0000000 00100000> - reg = <e0000000 00003000>; - bus-frequency = <0>; - - mdio@24520 { - reg = <24520 20>; - device_type = "mdio"; - compatible = "gianfar"; - - ethernet-phy@0 { - linux,phandle = <2452000> - interrupt-parent = <40000>; - interrupts = <35 1>; - reg = <0>; - device_type = "ethernet-phy"; - }; - - ethernet-phy@1 { - linux,phandle = <2452001> - interrupt-parent = <40000>; - interrupts = <35 1>; - reg = <1>; - device_type = "ethernet-phy"; - }; - - ethernet-phy@3 { - linux,phandle = <2452002> - interrupt-parent = <40000>; - interrupts = <35 1>; - reg = <3>; - device_type = "ethernet-phy"; - }; - - }; - - ethernet@24000 { - #size-cells = <0>; - device_type = "network"; - model = "TSEC"; - compatible = "gianfar"; - reg = <24000 1000>; - mac-address = [ 00 E0 0C 00 73 00 ]; - interrupts = <d 3 e 3 12 3>; - interrupt-parent = <40000>; - phy-handle = <2452000>; - }; - - ethernet@25000 { - #address-cells = <1>; - #size-cells = <0>; - device_type = "network"; - model = "TSEC"; - compatible = "gianfar"; - reg = <25000 1000>; - mac-address = [ 00 E0 0C 00 73 01 ]; - interrupts = <13 3 14 3 18 3>; - interrupt-parent = <40000>; - phy-handle = <2452001>; - }; - - ethernet@26000 { - #address-cells = <1>; - #size-cells = <0>; - device_type = "network"; - model = "FEC"; - compatible = "gianfar"; - reg = <26000 1000>; - mac-address = [ 00 E0 0C 00 73 02 ]; - interrupts = <19 3>; - interrupt-parent = <40000>; - phy-handle = <2452002>; - }; - - serial@4500 { - device_type = "serial"; - compatible = "ns16550"; - reg = <4500 100>; - clock-frequency = <0>; - interrupts = <1a 3>; - interrupt-parent = <40000>; - }; - - pic@40000 { - linux,phandle = <40000>; - clock-frequency = <0>; - interrupt-controller; - #address-cells = <0>; - reg = <40000 40000>; - built-in; - compatible = "chrp,open-pic"; - device_type = "open-pic"; - big-endian; - }; - - i2c@3000 { - interrupt-parent = <40000>; - interrupts = <1b 3>; - reg = <3000 18>; - device_type = "i2c"; - compatible = "fsl-i2c"; - dfsrr; - }; - - }; diff --git a/Documentation/powerpc/bootwrapper.txt b/Documentation/powerpc/bootwrapper.txt new file mode 100644 index 00000000000..d60fced5e1c --- /dev/null +++ b/Documentation/powerpc/bootwrapper.txt @@ -0,0 +1,141 @@ +The PowerPC boot wrapper +------------------------ +Copyright (C) Secret Lab Technologies Ltd. + +PowerPC image targets compresses and wraps the kernel image (vmlinux) with +a boot wrapper to make it usable by the system firmware. There is no +standard PowerPC firmware interface, so the boot wrapper is designed to +be adaptable for each kind of image that needs to be built. + +The boot wrapper can be found in the arch/powerpc/boot/ directory. The +Makefile in that directory has targets for all the available image types. +The different image types are used to support all of the various firmware +interfaces found on PowerPC platforms. OpenFirmware is the most commonly +used firmware type on general purpose PowerPC systems from Apple, IBM and +others. U-Boot is typically found on embedded PowerPC hardware, but there +are a handful of other firmware implementations which are also popular. Each +firmware interface requires a different image format. + +The boot wrapper is built from the makefile in arch/powerpc/boot/Makefile and +it uses the wrapper script (arch/powerpc/boot/wrapper) to generate target +image. The details of the build system is discussed in the next section. +Currently, the following image format targets exist: + + cuImage.%: Backwards compatible uImage for older version of + U-Boot (for versions that don't understand the device + tree). This image embeds a device tree blob inside + the image. The boot wrapper, kernel and device tree + are all embedded inside the U-Boot uImage file format + with boot wrapper code that extracts data from the old + bd_info structure and loads the data into the device + tree before jumping into the kernel. + Because of the series of #ifdefs found in the + bd_info structure used in the old U-Boot interfaces, + cuImages are platform specific. Each specific + U-Boot platform has a different platform init file + which populates the embedded device tree with data + from the platform specific bd_info file. The platform + specific cuImage platform init code can be found in + arch/powerpc/boot/cuboot.*.c. Selection of the correct + cuImage init code for a specific board can be found in + the wrapper structure. + dtbImage.%: Similar to zImage, except device tree blob is embedded + inside the image instead of provided by firmware. The + output image file can be either an elf file or a flat + binary depending on the platform. + dtbImages are used on systems which do not have an + interface for passing a device tree directly. + dtbImages are similar to simpleImages except that + dtbImages have platform specific code for extracting + data from the board firmware, but simpleImages do not + talk to the firmware at all. + PlayStation 3 support uses dtbImage. So do Embedded + Planet boards using the PlanetCore firmware. Board + specific initialization code is typically found in a + file named arch/powerpc/boot/<platform>.c; but this + can be overridden by the wrapper script. + simpleImage.%: Firmware independent compressed image that does not + depend on any particular firmware interface and embeds + a device tree blob. This image is a flat binary that + can be loaded to any location in RAM and jumped to. + Firmware cannot pass any configuration data to the + kernel with this image type and it depends entirely on + the embedded device tree for all information. + The simpleImage is useful for booting systems with + an unknown firmware interface or for booting from + a debugger when no firmware is present (such as on + the Xilinx Virtex platform). The only assumption that + simpleImage makes is that RAM is correctly initialized + and that the MMU is either off or has RAM mapped to + base address 0. + simpleImage also supports inserting special platform + specific initialization code to the start of the bootup + sequence. The virtex405 platform uses this feature to + ensure that the cache is invalidated before caching + is enabled. Platform specific initialization code is + added as part of the wrapper script and is keyed on + the image target name. For example, all + simpleImage.virtex405-* targets will add the + virtex405-head.S initialization code (This also means + that the dts file for virtex405 targets should be + named (virtex405-<board>.dts). Search the wrapper + script for 'virtex405' and see the file + arch/powerpc/boot/virtex405-head.S for details. + treeImage.%; Image format for used with OpenBIOS firmware found + on some ppc4xx hardware. This image embeds a device + tree blob inside the image. + uImage: Native image format used by U-Boot. The uImage target + does not add any boot code. It just wraps a compressed + vmlinux in the uImage data structure. This image + requires a version of U-Boot that is able to pass + a device tree to the kernel at boot. If using an older + version of U-Boot, then you need to use a cuImage + instead. + zImage.%: Image format which does not embed a device tree. + Used by OpenFirmware and other firmware interfaces + which are able to supply a device tree. This image + expects firmware to provide the device tree at boot. + Typically, if you have general purpose PowerPC + hardware then you want this image format. + +Image types which embed a device tree blob (simpleImage, dtbImage, treeImage, +and cuImage) all generate the device tree blob from a file in the +arch/powerpc/boot/dts/ directory. The Makefile selects the correct device +tree source based on the name of the target. Therefore, if the kernel is +built with 'make treeImage.walnut simpleImage.virtex405-ml403', then the +build system will use arch/powerpc/boot/dts/walnut.dts to build +treeImage.walnut and arch/powerpc/boot/dts/virtex405-ml403.dts to build +the simpleImage.virtex405-ml403. + +Two special targets called 'zImage' and 'zImage.initrd' also exist. These +targets build all the default images as selected by the kernel configuration. +Default images are selected by the boot wrapper Makefile +(arch/powerpc/boot/Makefile) by adding targets to the $image-y variable. Look +at the Makefile to see which default image targets are available. + +How it is built +--------------- +arch/powerpc is designed to support multiplatform kernels, which means +that a single vmlinux image can be booted on many different target boards. +It also means that the boot wrapper must be able to wrap for many kinds of +images on a single build. The design decision was made to not use any +conditional compilation code (#ifdef, etc) in the boot wrapper source code. +All of the boot wrapper pieces are buildable at any time regardless of the +kernel configuration. Building all the wrapper bits on every kernel build +also ensures that obscure parts of the wrapper are at the very least compile +tested in a large variety of environments. + +The wrapper is adapted for different image types at link time by linking in +just the wrapper bits that are appropriate for the image type. The 'wrapper +script' (found in arch/powerpc/boot/wrapper) is called by the Makefile and +is responsible for selecting the correct wrapper bits for the image type. +The arguments are well documented in the script's comment block, so they +are not repeated here. However, it is worth mentioning that the script +uses the -p (platform) argument as the main method of deciding which wrapper +bits to compile in. Look for the large 'case "$platform" in' block in the +middle of the script. This is also the place where platform specific fixups +can be selected by changing the link order. + +In particular, care should be taken when working with cuImages. cuImage +wrapper bits are very board specific and care should be taken to make sure +the target you are trying to build is supported by the wrapper bits. diff --git a/Documentation/powerpc/cpu_families.txt b/Documentation/powerpc/cpu_families.txt new file mode 100644 index 00000000000..fc08e22feb1 --- /dev/null +++ b/Documentation/powerpc/cpu_families.txt @@ -0,0 +1,221 @@ +CPU Families +============ + +This document tries to summarise some of the different cpu families that exist +and are supported by arch/powerpc. + + +Book3S (aka sPAPR) +------------------ + + - Hash MMU + - Mix of 32 & 64 bit + + +--------------+ +----------------+ + | Old POWER | --------------> | RS64 (threads) | + +--------------+ +----------------+ + | + | + v + +--------------+ +----------------+ +------+ + | 601 | --------------> | 603 | ---> | e300 | + +--------------+ +----------------+ +------+ + | | + | | + v v + +--------------+ +----------------+ +-------+ + | 604 | | 750 (G3) | ---> | 750CX | + +--------------+ +----------------+ +-------+ + | | | + | | | + v v v + +--------------+ +----------------+ +-------+ + | 620 (64 bit) | | 7400 | | 750CL | + +--------------+ +----------------+ +-------+ + | | | + | | | + v v v + +--------------+ +----------------+ +-------+ + | POWER3/630 | | 7410 | | 750FX | + +--------------+ +----------------+ +-------+ + | | + | | + v v + +--------------+ +----------------+ + | POWER3+ | | 7450 | + +--------------+ +----------------+ + | | + | | + v v + +--------------+ +----------------+ + | POWER4 | | 7455 | + +--------------+ +----------------+ + | | + | | + v v + +--------------+ +-------+ +----------------+ + | POWER4+ | --> | 970 | | 7447 | + +--------------+ +-------+ +----------------+ + | | | + | | | + v v v + +--------------+ +-------+ +----------------+ + | POWER5 | | 970FX | | 7448 | + +--------------+ +-------+ +----------------+ + | | | + | | | + v v v + +--------------+ +-------+ +----------------+ + | POWER5+ | | 970MP | | e600 | + +--------------+ +-------+ +----------------+ + | + | + v + +--------------+ + | POWER5++ | + +--------------+ + | + | + v + +--------------+ +-------+ + | POWER6 | <-?-> | Cell | + +--------------+ +-------+ + | + | + v + +--------------+ + | POWER7 | + +--------------+ + | + | + v + +--------------+ + | POWER7+ | + +--------------+ + | + | + v + +--------------+ + | POWER8 | + +--------------+ + + + +---------------+ + | PA6T (64 bit) | + +---------------+ + + +IBM BookE +--------- + + - Software loaded TLB. + - All 32 bit + + +--------------+ + | 401 | + +--------------+ + | + | + v + +--------------+ + | 403 | + +--------------+ + | + | + v + +--------------+ + | 405 | + +--------------+ + | + | + v + +--------------+ + | 440 | + +--------------+ + | + | + v + +--------------+ +----------------+ + | 450 | --> | BG/P | + +--------------+ +----------------+ + | + | + v + +--------------+ + | 460 | + +--------------+ + | + | + v + +--------------+ + | 476 | + +--------------+ + + +Motorola/Freescale 8xx +---------------------- + + - Software loaded with hardware assist. + - All 32 bit + + +-------------+ + | MPC8xx Core | + +-------------+ + + +Freescale BookE +--------------- + + - Software loaded TLB. + - e6500 adds HW loaded indirect TLB entries. + - Mix of 32 & 64 bit + + +--------------+ + | e200 | + +--------------+ + + + +--------------------------------+ + | e500 | + +--------------------------------+ + | + | + v + +--------------------------------+ + | e500v2 | + +--------------------------------+ + | + | + v + +--------------------------------+ + | e500mc (Book3e) | + +--------------------------------+ + | + | + v + +--------------------------------+ + | e5500 (64 bit) | + +--------------------------------+ + | + | + v + +--------------------------------+ + | e6500 (HW TLB) (Multithreaded) | + +--------------------------------+ + + +IBM A2 core +----------- + + - Book3E, software loaded TLB + HW loaded indirect TLB entries. + - 64 bit + + +--------------+ +----------------+ + | A2 core | --> | WSP | + +--------------+ +----------------+ + | + | + v + +--------------+ + | BG/Q | + +--------------+ diff --git a/Documentation/powerpc/cpu_features.txt b/Documentation/powerpc/cpu_features.txt index 472739880e8..ae09df8722c 100644 --- a/Documentation/powerpc/cpu_features.txt +++ b/Documentation/powerpc/cpu_features.txt @@ -11,10 +11,10 @@ split instruction and data caches, and if the CPU supports the DOZE and NAP sleep modes. Detection of the feature set is simple. A list of processors can be found in -arch/ppc/kernel/cputable.c. The PVR register is masked and compared with each -value in the list. If a match is found, the cpu_features of cur_cpu_spec is -assigned to the feature bitmask for this processor and a __setup_cpu function -is called. +arch/powerpc/kernel/cputable.c. The PVR register is masked and compared with +each value in the list. If a match is found, the cpu_features of cur_cpu_spec +is assigned to the feature bitmask for this processor and a __setup_cpu +function is called. C code may test 'cur_cpu_spec[smp_processor_id()]->cpu_features' for a particular feature bit. This is done in quite a few places, for example @@ -31,7 +31,7 @@ anyways). After detecting the processor type, the kernel patches out sections of code that shouldn't be used by writing nop's over it. Using cpufeatures requires -just 2 macros (found in include/asm-ppc/cputable.h), as seen in head.S +just 2 macros (found in arch/powerpc/include/asm/cputable.h), as seen in head.S transfer_to_handler: #ifdef CONFIG_ALTIVEC @@ -51,6 +51,6 @@ should be used in the majority of cases. The END_FTR_SECTION macros are implemented by storing information about this code in the '__ftr_fixup' ELF section. When do_cpu_ftr_fixups -(arch/ppc/kernel/misc.S) is invoked, it will iterate over the records in +(arch/powerpc/kernel/misc.S) is invoked, it will iterate over the records in __ftr_fixup, and if the required feature is not present it will loop writing nop's from each BEGIN_FTR_SECTION to END_FTR_SECTION. diff --git a/Documentation/powerpc/eeh-pci-error-recovery.txt b/Documentation/powerpc/eeh-pci-error-recovery.txt index df7afe43d46..9d4e33df624 100644 --- a/Documentation/powerpc/eeh-pci-error-recovery.txt +++ b/Documentation/powerpc/eeh-pci-error-recovery.txt @@ -133,7 +133,7 @@ error. Given an arbitrary address, the routine pci_get_device_by_addr() will find the pci device associated with that address (if any). -The default include/asm-powerpc/io.h macros readb(), inb(), insb(), +The default arch/powerpc/include/asm/io.h macros readb(), inb(), insb(), etc. include a check to see if the i/o read returned all-0xff's. If so, these make a call to eeh_dn_check_failure(), which in turn asks the firmware if the all-ff's value is the sign of a true EEH diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt new file mode 100644 index 00000000000..3007bc98af2 --- /dev/null +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -0,0 +1,270 @@ + + Firmware-Assisted Dump + ------------------------ + July 2011 + +The goal of firmware-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use. + +- Firmware assisted dump (fadump) infrastructure is intended to replace + the existing phyp assisted dump. +- Fadump uses the same firmware interfaces and memory reservation model + as phyp assisted dump. +- Unlike phyp dump, fadump exports the memory dump through /proc/vmcore + in the ELF format in the same way as kdump. This helps us reuse the + kdump infrastructure for dump capture and filtering. +- Unlike phyp dump, userspace tool does not need to refer any sysfs + interface while reading /proc/vmcore. +- Unlike phyp dump, fadump allows user to release all the memory reserved + for dump, with a single operation of echo 1 > /sys/kernel/fadump_release_mem. +- Once enabled through kernel boot parameter, fadump can be + started/stopped through /sys/kernel/fadump_registered interface (see + sysfs files section below) and can be easily integrated with kdump + service start/stop init scripts. + +Comparing with kdump or other strategies, firmware-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- Once the dump is copied out, the memory that held the dump + is immediately available to the running kernel. And therefore, + unlike kdump, fadump doesn't need a 2nd reboot to get back + the system to the production configuration. + +The above can only be accomplished by coordination with, +and assistance from the Power firmware. The procedure is +as follows: + +-- The first kernel registers the sections of memory with the + Power firmware for dump preservation during OS initialization. + These registered sections of memory are reserved by the first + kernel during early boot. + +-- When a system crashes, the Power firmware will save + the low memory (boot memory of size larger of 5% of system RAM + or 256MB) of RAM to the previous registered region. It will + also save system registers, and hardware PTE's. + + NOTE: The term 'boot memory' means size of the low memory chunk + that is required for a kernel to boot successfully when + booted with restricted memory. By default, the boot memory + size will be the larger of 5% of system RAM or 256MB. + Alternatively, user can also specify boot memory size + through boot parameter 'fadump_reserve_mem=' which will + override the default calculated size. Use this option + if default boot memory size is not sufficient for second + kernel to boot successfully. + +-- After the low memory (boot memory) area has been saved, the + firmware will reset PCI and other hardware state. It will + *not* clear the RAM. It will then launch the bootloader, as + normal. + +-- The freshly booted kernel will notice that there is a new + node (ibm,dump-kernel) in the device tree, indicating that + there is crash data available from a previous boot. During + the early boot OS will reserve rest of the memory above + boot memory size effectively booting with restricted memory + size. This will make sure that the second kernel will not + touch any of the dump memory area. + +-- User-space tools will read /proc/vmcore to obtain the contents + of memory, which holds the previous crashed kernel dump in ELF + format. The userspace tools may copy this info to disk, or + network, nas, san, iscsi, etc. as desired. + +-- Once the userspace tool is done saving dump, it will echo + '1' to /sys/kernel/fadump_release_mem to release the reserved + memory back to general use, except the memory required for + next firmware-assisted dump registration. + + e.g. + # echo 1 > /sys/kernel/fadump_release_mem + +Please note that the firmware-assisted dump feature +is only available on Power6 and above systems with recent +firmware versions. + +Implementation details: +---------------------- + +During boot, a check is made to see if firmware supports +this feature on that particular machine. If it does, then +we check to see if an active dump is waiting for us. If yes +then everything but boot memory size of RAM is reserved during +early boot (See Fig. 2). This area is released once we finish +collecting the dump from user land scripts (e.g. kdump scripts) +that are run. If there is dump data, then the +/sys/kernel/fadump_release_mem file is created, and the reserved +memory is held. + +If there is no waiting dump data, then only the memory required +to hold CPU state, HPTE region, boot memory dump and elfcore +header, is reserved at the top of memory (see Fig. 1). This area +is *not* released: this region will be kept permanently reserved, +so that it can act as a receptacle for a copy of the boot memory +content in addition to CPU state and HPTE region, in the case a +crash does occur. + + o Memory Reservation during first kernel + + Low memory Top of memory + 0 boot memory size | + | | |<--Reserved dump area -->| + V V | Permanent Reservation V + +-----------+----------/ /----------+---+----+-----------+----+ + | | |CPU|HPTE| DUMP |ELF | + +-----------+----------/ /----------+---+----+-----------+----+ + | ^ + | | + \ / + ------------------------------------------- + Boot memory content gets transferred to + reserved area by firmware at the time of + crash + Fig. 1 + + o Memory Reservation during second kernel after crash + + Low memory Top of memory + 0 boot memory size | + | |<------------- Reserved dump area ----------- -->| + V V V + +-----------+----------/ /----------+---+----+-----------+----+ + | | |CPU|HPTE| DUMP |ELF | + +-----------+----------/ /----------+---+----+-----------+----+ + | | + V V + Used by second /proc/vmcore + kernel to boot + Fig. 2 + +Currently the dump will be copied from /proc/vmcore to a +a new file upon user intervention. The dump data available through +/proc/vmcore will be in ELF format. Hence the existing kdump +infrastructure (kdump scripts) to save the dump works fine with +minor modifications. + +The tools to examine the dump will be same as the ones +used for kdump. + +How to enable firmware-assisted dump (fadump): +------------------------------------- + +1. Set config option CONFIG_FA_DUMP=y and build kernel. +2. Boot into linux kernel with 'fadump=on' kernel cmdline option. +3. Optionally, user can also set 'fadump_reserve_mem=' kernel cmdline + to specify size of the memory to reserve for boot memory dump + preservation. + +NOTE: If firmware-assisted dump fails to reserve memory then it will + fallback to existing kdump mechanism if 'crashkernel=' option + is set at kernel cmdline. + +Sysfs/debugfs files: +------------ + +Firmware-assisted dump feature uses sysfs file system to hold +the control files and debugfs file to display memory reserved region. + +Here is the list of files under kernel sysfs: + + /sys/kernel/fadump_enabled + + This is used to display the fadump status. + 0 = fadump is disabled + 1 = fadump is enabled + + This interface can be used by kdump init scripts to identify if + fadump is enabled in the kernel and act accordingly. + + /sys/kernel/fadump_registered + + This is used to display the fadump registration status as well + as to control (start/stop) the fadump registration. + 0 = fadump is not registered. + 1 = fadump is registered and ready to handle system crash. + + To register fadump echo 1 > /sys/kernel/fadump_registered and + echo 0 > /sys/kernel/fadump_registered for un-register and stop the + fadump. Once the fadump is un-registered, the system crash will not + be handled and vmcore will not be captured. This interface can be + easily integrated with kdump service start/stop. + + /sys/kernel/fadump_release_mem + + This file is available only when fadump is active during + second kernel. This is used to release the reserved memory + region that are held for saving crash dump. To release the + reserved memory echo 1 to it: + + echo 1 > /sys/kernel/fadump_release_mem + + After echo 1, the content of the /sys/kernel/debug/powerpc/fadump_region + file will change to reflect the new memory reservations. + + The existing userspace tools (kdump infrastructure) can be easily + enhanced to use this interface to release the memory reserved for + dump and continue without 2nd reboot. + +Here is the list of files under powerpc debugfs: +(Assuming debugfs is mounted on /sys/kernel/debug directory.) + + /sys/kernel/debug/powerpc/fadump_region + + This file shows the reserved memory regions if fadump is + enabled otherwise this file is empty. The output format + is: + <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size> + + e.g. + Contents when fadump is registered during first kernel + + # cat /sys/kernel/debug/powerpc/fadump_region + CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0 + HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0 + DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0 + + Contents when fadump is active during second kernel + + # cat /sys/kernel/debug/powerpc/fadump_region + CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020 + HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x1000 + DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000 + : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000 + +NOTE: Please refer to Documentation/filesystems/debugfs.txt on + how to mount the debugfs filesystem. + + +TODO: +----- + o Need to come up with the better approach to find out more + accurate boot memory size that is required for a kernel to + boot successfully when booted with restricted memory. + o The fadump implementation introduces a fadump crash info structure + in the scratch area before the ELF core header. The idea of introducing + this structure is to pass some important crash info data to the second + kernel which will help second kernel to populate ELF core header with + correct data before it gets exported through /proc/vmcore. The current + design implementation does not address a possibility of introducing + additional fields (in future) to this structure without affecting + compatibility. Need to come up with the better approach to address this. + The possible approaches are: + 1. Introduce version field for version tracking, bump up the version + whenever a new field is added to the structure in future. The version + field can be used to find out what fields are valid for the current + version of the structure. + 2. Reserve the area of predefined size (say PAGE_SIZE) for this + structure and have unused area as reserved (initialized to zero) + for future field additions. + The advantage of approach 1 over 2 is we don't need to reserve extra space. +--- +Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> +This document is based on the original documentation written for phyp +assisted dump by Linas Vepstas and Manish Ahuja. diff --git a/Documentation/powerpc/hvcs.txt b/Documentation/powerpc/hvcs.txt index f93462c5db2..a730ca5a07f 100644 --- a/Documentation/powerpc/hvcs.txt +++ b/Documentation/powerpc/hvcs.txt @@ -528,7 +528,7 @@ this driver assignment of hotplug added vty-servers may be in a different order than how they would be exposed on module load. Rebooting or reloading the module after dynamic addition may result in the /dev/hvcs* and vty-server coupling changing if a vty-server adapter was added in a -slot inbetween two other vty-server adapters. Refer to the section above +slot between two other vty-server adapters. Refer to the section above on how to determine which vty-server goes with which /dev/hvcs* node. Hint; look at the sysfs "index" attribute for the vty-server. @@ -560,7 +560,7 @@ The proper channel for reporting bugs is either through the Linux OS distribution company that provided your OS or by posting issues to the PowerPC development mailing list at: -linuxppc-dev@ozlabs.org +linuxppc-dev@lists.ozlabs.org This request is to provide a documented and searchable public exchange of the problems and solutions surrounding this driver for the benefit of diff --git a/Documentation/powerpc/kvm_440.txt b/Documentation/powerpc/kvm_440.txt new file mode 100644 index 00000000000..c02a003fa03 --- /dev/null +++ b/Documentation/powerpc/kvm_440.txt @@ -0,0 +1,41 @@ +Hollis Blanchard <hollisb@us.ibm.com> +15 Apr 2008 + +Various notes on the implementation of KVM for PowerPC 440: + +To enforce isolation, host userspace, guest kernel, and guest userspace all +run at user privilege level. Only the host kernel runs in supervisor mode. +Executing privileged instructions in the guest traps into KVM (in the host +kernel), where we decode and emulate them. Through this technique, unmodified +440 Linux kernels can be run (slowly) as guests. Future performance work will +focus on reducing the overhead and frequency of these traps. + +The usual code flow is started from userspace invoking an "run" ioctl, which +causes KVM to switch into guest context. We use IVPR to hijack the host +interrupt vectors while running the guest, which allows us to direct all +interrupts to kvmppc_handle_interrupt(). At this point, we could either +- handle the interrupt completely (e.g. emulate "mtspr SPRG0"), or +- let the host interrupt handler run (e.g. when the decrementer fires), or +- return to host userspace (e.g. when the guest performs device MMIO) + +Address spaces: We take advantage of the fact that Linux doesn't use the AS=1 +address space (in host or guest), which gives us virtual address space to use +for guest mappings. While the guest is running, the host kernel remains mapped +in AS=0, but the guest can only use AS=1 mappings. + +TLB entries: The TLB entries covering the host linear mapping remain +present while running the guest. This reduces the overhead of lightweight +exits, which are handled by KVM running in the host kernel. We keep three +copies of the TLB: + - guest TLB: contents of the TLB as the guest sees it + - shadow TLB: the TLB that is actually in hardware while guest is running + - host TLB: to restore TLB state when context switching guest -> host +When a TLB miss occurs because a mapping was not present in the shadow TLB, +but was present in the guest TLB, KVM handles the fault without invoking the +guest. Large guest pages are backed by multiple 4KB shadow pages through this +mechanism. + +IO: MMIO and DCR accesses are emulated by userspace. We use virtio for network +and block IO, so those drivers must be enabled in the guest. It's possible +that some qemu device emulation (e.g. e1000 or rtl8139) may also work with +little effort. diff --git a/Documentation/powerpc/mpc52xx-device-tree-bindings.txt b/Documentation/powerpc/mpc52xx-device-tree-bindings.txt deleted file mode 100644 index 5e03610e186..00000000000 --- a/Documentation/powerpc/mpc52xx-device-tree-bindings.txt +++ /dev/null @@ -1,254 +0,0 @@ -MPC5200 Device Tree Bindings ----------------------------- - -(c) 2006-2007 Secret Lab Technologies Ltd -Grant Likely <grant.likely at secretlab.ca> - -********** DRAFT *********** -* WARNING: Do not depend on the stability of these bindings just yet. -* The MPC5200 device tree conventions are still in flux -* Keep an eye on the linuxppc-dev mailing list for more details -********** DRAFT *********** - -I - Introduction -================ -Boards supported by the arch/powerpc architecture require device tree be -passed by the boot loader to the kernel at boot time. The device tree -describes what devices are present on the board and how they are -connected. The device tree can either be passed as a binary blob (as -described in Documentation/powerpc/booting-without-of.txt), or passed -by Open Firmware (IEEE 1275) compatible firmware using an OF compatible -client interface API. - -This document specifies the requirements on the device-tree for mpc5200 -based boards. These requirements are above and beyond the details -specified in either the Open Firmware spec or booting-without-of.txt - -All new mpc5200-based boards are expected to match this document. In -cases where this document is not sufficient to support a new board port, -this document should be updated as part of adding the new board support. - -II - Philosophy -=============== -The core of this document is naming convention. The whole point of -defining this convention is to reduce or eliminate the number of -special cases required to support a 5200 board. If all 5200 boards -follow the same convention, then generic 5200 support code will work -rather than coding special cases for each new board. - -This section tries to capture the thought process behind why the naming -convention is what it is. - -1. names ---------- -There is strong convention/requirements already established for children -of the root node. 'cpus' describes the processor cores, 'memory' -describes memory, and 'chosen' provides boot configuration. Other nodes -are added to describe devices attached to the processor local bus. - -Following convention already established with other system-on-chip -processors, 5200 device trees should use the name 'soc5200' for the -parent node of on chip devices, and the root node should be its parent. - -Child nodes are typically named after the configured function. ie. -the FEC node is named 'ethernet', and a PSC in uart mode is named 'serial'. - -2. device_type property ------------------------ -similar to the node name convention above; the device_type reflects the -configured function of a device. ie. 'serial' for a uart and 'spi' for -an spi controller. However, while node names *should* reflect the -configured function, device_type *must* match the configured function -exactly. - -3. compatible property ----------------------- -Since device_type isn't enough to match devices to drivers, there also -needs to be a naming convention for the compatible property. Compatible -is an list of device descriptions sorted from specific to generic. For -the mpc5200, the required format for each compatible value is -<chip>-<device>[-<mode>]. The OS should be able to match a device driver -to the device based solely on the compatible value. If two drivers -match on the compatible list; the 'most compatible' driver should be -selected. - -The split between the MPC5200 and the MPC5200B leaves a bit of a -conundrum. How should the compatible property be set up to provide -maximum compatibility information; but still accurately describe the -chip? For the MPC5200; the answer is easy. Most of the SoC devices -originally appeared on the MPC5200. Since they didn't exist anywhere -else; the 5200 compatible properties will contain only one item; -"mpc5200-<device>". - -The 5200B is almost the same as the 5200, but not quite. It fixes -silicon bugs and it adds a small number of enhancements. Most of the -devices either provide exactly the same interface as on the 5200. A few -devices have extra functions but still have a backwards compatible mode. -To express this information as completely as possible, 5200B device trees -should have two items in the compatible list; -"mpc5200b-<device>\0mpc5200-<device>". It is *strongly* recommended -that 5200B device trees follow this convention (instead of only listing -the base mpc5200 item). - -If another chip appear on the market with one of the mpc5200 SoC -devices, then the compatible list should include mpc5200-<device>. - -ie. ethernet on mpc5200: compatible = "mpc5200-ethernet" - ethernet on mpc5200b: compatible = "mpc5200b-ethernet\0mpc5200-ethernet" - -Modal devices, like PSCs, also append the configured function to the -end of the compatible field. ie. A PSC in i2s mode would specify -"mpc5200-psc-i2s", not "mpc5200-i2s". This convention is chosen to -avoid naming conflicts with non-psc devices providing the same -function. For example, "mpc5200-spi" and "mpc5200-psc-spi" describe -the mpc5200 simple spi device and a PSC spi mode respectively. - -If the soc device is more generic and present on other SOCs, the -compatible property can specify the more generic device type also. - -ie. mscan: compatible = "mpc5200-mscan\0fsl,mscan"; - -At the time of writing, exact chip may be either 'mpc5200' or -'mpc5200b'. - -Device drivers should always try to match as generically as possible. - -III - Structure -=============== -The device tree for an mpc5200 board follows the structure defined in -booting-without-of.txt with the following additional notes: - -0) the root node ----------------- -Typical root description node; see booting-without-of - -1) The cpus node ----------------- -The cpus node follows the basic layout described in booting-without-of. -The bus-frequency property holds the XLB bus frequency -The clock-frequency property holds the core frequency - -2) The memory node ------------------- -Typical memory description node; see booting-without-of. - -3) The soc5200 node -------------------- -This node describes the on chip SOC peripherals. Every mpc5200 based -board will have this node, and as such there is a common naming -convention for SOC devices. - -Required properties: -name type description ----- ---- ----------- -device_type string must be "soc" -ranges int should be <0 baseaddr baseaddr+10000> -reg int must be <baseaddr 10000> -compatible string mpc5200: "mpc5200-soc" - mpc5200b: "mpc5200b-soc\0mpc5200-soc" -system-frequency int Fsystem frequency; source of all - other clocks. -bus-frequency int IPB bus frequency in HZ. Clock rate - used by most of the soc devices. -#interrupt-cells int must be <3>. - -Recommended properties: -name type description ----- ---- ----------- -model string Exact model of the chip; - ie: model="fsl,mpc5200" -revision string Silicon revision of chip - ie: revision="M08A" - -The 'model' and 'revision' properties are *strongly* recommended. Having -them presence acts as a bit of a safety net for working around as yet -undiscovered bugs on one version of silicon. For example, device drivers -can use the model and revision properties to decide if a bug fix should -be turned on. - -4) soc5200 child nodes ----------------------- -Any on chip SOC devices available to Linux must appear as soc5200 child nodes. - -Note: The tables below show the value for the mpc5200. A mpc5200b device -tree should use the "mpc5200b-<device>\0mpc5200-<device> form. - -Required soc5200 child nodes: -name device_type compatible Description ----- ----------- ---------- ----------- -cdm@<addr> cdm mpc5200-cmd Clock Distribution -pic@<addr> interrupt-controller mpc5200-pic need an interrupt - controller to boot -bestcomm@<addr> dma-controller mpc5200-bestcomm 5200 pic also requires - the bestcomm device - -Recommended soc5200 child nodes; populate as needed for your board -name device_type compatible Description ----- ----------- ---------- ----------- -gpt@<addr> gpt fsl,mpc5200-gpt General purpose timers -rtc@<addr> rtc mpc5200-rtc Real time clock -mscan@<addr> mscan mpc5200-mscan CAN bus controller -pci@<addr> pci mpc5200-pci PCI bridge -serial@<addr> serial mpc5200-psc-uart PSC in serial mode -i2s@<addr> sound mpc5200-psc-i2s PSC in i2s mode -ac97@<addr> sound mpc5200-psc-ac97 PSC in ac97 mode -spi@<addr> spi mpc5200-psc-spi PSC in spi mode -irda@<addr> irda mpc5200-psc-irda PSC in IrDA mode -spi@<addr> spi mpc5200-spi MPC5200 spi device -ethernet@<addr> network mpc5200-fec MPC5200 ethernet device -ata@<addr> ata mpc5200-ata IDE ATA interface -i2c@<addr> i2c mpc5200-i2c I2C controller -usb@<addr> usb-ohci-be mpc5200-ohci,ohci-be USB controller -xlb@<addr> xlb mpc5200-xlb XLB arbitrator - -Important child node properties -name type description ----- ---- ----------- -cell-index int When multiple devices are present, is the - index of the device in the hardware (ie. There - are 6 PSC on the 5200 numbered PSC1 to PSC6) - PSC1 has 'cell-index = <0>' - PSC4 has 'cell-index = <3>' - -5) General Purpose Timer nodes (child of soc5200 node) -On the mpc5200 and 5200b, GPT0 has a watchdog timer function. If the board -design supports the internal wdt, then the device node for GPT0 should -include the empty property 'fsl,has-wdt'. - -6) PSC nodes (child of soc5200 node) -PSC nodes can define the optional 'port-number' property to force assignment -order of serial ports. For example, PSC5 might be physically connected to -the port labeled 'COM1' and PSC1 wired to 'COM1'. In this case, PSC5 would -have a "port-number = <0>" property, and PSC1 would have "port-number = <1>". - -PSC in i2s mode: The mpc5200 and mpc5200b PSCs are not compatible when in -i2s mode. An 'mpc5200b-psc-i2s' node cannot include 'mpc5200-psc-i2s' in the -compatible field. - -IV - Extra Notes -================ - -1. Interrupt mapping --------------------- -The mpc5200 pic driver splits hardware IRQ numbers into two levels. The -split reflects the layout of the PIC hardware itself, which groups -interrupts into one of three groups; CRIT, MAIN or PERP. Also, the -Bestcomm dma engine has it's own set of interrupt sources which are -cascaded off of peripheral interrupt 0, which the driver interprets as a -fourth group, SDMA. - -The interrupts property for device nodes using the mpc5200 pic consists -of three cells; <L1 L2 level> - - L1 := [CRIT=0, MAIN=1, PERP=2, SDMA=3] - L2 := interrupt number; directly mapped from the value in the - "ICTL PerStat, MainStat, CritStat Encoded Register" - level := [LEVEL_HIGH=0, EDGE_RISING=1, EDGE_FALLING=2, LEVEL_LOW=3] - -2. Shared registers -------------------- -Some SoC devices share registers between them. ie. the i2c devices use -a single clock control register, and almost all device are affected by -the port_config register. Devices which need to manipulate shared regs -should look to the parent SoC node. The soc node is responsible -for arbitrating all shared register access. diff --git a/Documentation/powerpc/mpc52xx.txt b/Documentation/powerpc/mpc52xx.txt index 10dd4ab93b8..0d540a31ea1 100644 --- a/Documentation/powerpc/mpc52xx.txt +++ b/Documentation/powerpc/mpc52xx.txt @@ -2,7 +2,7 @@ Linux 2.6.x on MPC52xx family ----------------------------- For the latest info, go to http://www.246tNt.com/mpc52xx/ - + To compile/use : - U-Boot: @@ -10,23 +10,23 @@ To compile/use : if you wish to ). # make lite5200_defconfig # make uImage - + then, on U-boot: => tftpboot 200000 uImage => tftpboot 400000 pRamdisk => bootm 200000 400000 - + - DBug: # <edit Makefile to set ARCH=ppc & CROSS_COMPILE=... ( also EXTRAVERSION if you wish to ). # make lite5200_defconfig # cp your_initrd.gz arch/ppc/boot/images/ramdisk.image.gz - # make zImage.initrd - # make + # make zImage.initrd + # make then in DBug: DBug> dn -i zImage.initrd.lite5200 - + Some remarks : - The port is named mpc52xxx, and config options are PPC_MPC52xx. The MGT5100 diff --git a/Documentation/powerpc/pmu-ebb.txt b/Documentation/powerpc/pmu-ebb.txt new file mode 100644 index 00000000000..73cd163dbfb --- /dev/null +++ b/Documentation/powerpc/pmu-ebb.txt @@ -0,0 +1,137 @@ +PMU Event Based Branches +======================== + +Event Based Branches (EBBs) are a feature which allows the hardware to +branch directly to a specified user space address when certain events occur. + +The full specification is available in Power ISA v2.07: + + https://www.power.org/documentation/power-isa-version-2-07/ + +One type of event for which EBBs can be configured is PMU exceptions. This +document describes the API for configuring the Power PMU to generate EBBs, +using the Linux perf_events API. + + +Terminology +----------- + +Throughout this document we will refer to an "EBB event" or "EBB events". This +just refers to a struct perf_event which has set the "EBB" flag in its +attr.config. All events which can be configured on the hardware PMU are +possible "EBB events". + + +Background +---------- + +When a PMU EBB occurs it is delivered to the currently running process. As such +EBBs can only sensibly be used by programs for self-monitoring. + +It is a feature of the perf_events API that events can be created on other +processes, subject to standard permission checks. This is also true of EBB +events, however unless the target process enables EBBs (via mtspr(BESCR)) no +EBBs will ever be delivered. + +This makes it possible for a process to enable EBBs for itself, but not +actually configure any events. At a later time another process can come along +and attach an EBB event to the process, which will then cause EBBs to be +delivered to the first process. It's not clear if this is actually useful. + + +When the PMU is configured for EBBs, all PMU interrupts are delivered to the +user process. This means once an EBB event is scheduled on the PMU, no non-EBB +events can be configured. This means that EBB events can not be run +concurrently with regular 'perf' commands, or any other perf events. + +It is however safe to run 'perf' commands on a process which is using EBBs. The +kernel will in general schedule the EBB event, and perf will be notified that +its events could not run. + +The exclusion between EBB events and regular events is implemented using the +existing "pinned" and "exclusive" attributes of perf_events. This means EBB +events will be given priority over other events, unless they are also pinned. +If an EBB event and a regular event are both pinned, then whichever is enabled +first will be scheduled and the other will be put in error state. See the +section below titled "Enabling an EBB event" for more information. + + +Creating an EBB event +--------------------- + +To request that an event is counted using EBB, the event code should have bit +63 set. + +EBB events must be created with a particular, and restrictive, set of +attributes - this is so that they interoperate correctly with the rest of the +perf_events subsystem. + +An EBB event must be created with the "pinned" and "exclusive" attributes set. +Note that if you are creating a group of EBB events, only the leader can have +these attributes set. + +An EBB event must NOT set any of the "inherit", "sample_period", "freq" or +"enable_on_exec" attributes. + +An EBB event must be attached to a task. This is specified to perf_event_open() +by passing a pid value, typically 0 indicating the current task. + +All events in a group must agree on whether they want EBB. That is all events +must request EBB, or none may request EBB. + +EBB events must specify the PMC they are to be counted on. This ensures +userspace is able to reliably determine which PMC the event is scheduled on. + + +Enabling an EBB event +--------------------- + +Once an EBB event has been successfully opened, it must be enabled with the +perf_events API. This can be achieved either via the ioctl() interface, or the +prctl() interface. + +However, due to the design of the perf_events API, enabling an event does not +guarantee that it has been scheduled on the PMU. To ensure that the EBB event +has been scheduled on the PMU, you must perform a read() on the event. If the +read() returns EOF, then the event has not been scheduled and EBBs are not +enabled. + +This behaviour occurs because the EBB event is pinned and exclusive. When the +EBB event is enabled it will force all other non-pinned events off the PMU. In +this case the enable will be successful. However if there is already an event +pinned on the PMU then the enable will not be successful. + + +Reading an EBB event +-------------------- + +It is possible to read() from an EBB event. However the results are +meaningless. Because interrupts are being delivered to the user process the +kernel is not able to count the event, and so will return a junk value. + + +Closing an EBB event +-------------------- + +When an EBB event is finished with, you can close it using close() as for any +regular event. If this is the last EBB event the PMU will be deconfigured and +no further PMU EBBs will be delivered. + + +EBB Handler +----------- + +The EBB handler is just regular userspace code, however it must be written in +the style of an interrupt handler. When the handler is entered all registers +are live (possibly) and so must be saved somehow before the handler can invoke +other code. + +It's up to the program how to handle this. For C programs a relatively simple +option is to create an interrupt frame on the stack and save registers there. + +Fork +---- + +EBB events are not inherited across fork. If the child process wishes to use +EBBs it should open a new event for itself. Similarly the EBB state in +BESCR/EBBHR/EBBRR is cleared across fork(). diff --git a/Documentation/powerpc/ppc_htab.txt b/Documentation/powerpc/ppc_htab.txt deleted file mode 100644 index 8b8c7df29fa..00000000000 --- a/Documentation/powerpc/ppc_htab.txt +++ /dev/null @@ -1,118 +0,0 @@ - Information about /proc/ppc_htab -===================================================================== - -This document and the related code was written by me (Cort Dougan), please -email me (cort@fsmlabs.com) if you have questions, comments or corrections. - -Last Change: 2.16.98 - -This entry in the proc directory is readable by all users but only -writable by root. - -The ppc_htab interface is a user level way of accessing the -performance monitoring registers as well as providing information -about the PTE hash table. - -1. Reading - - Reading this file will give you information about the memory management - hash table that serves as an extended tlb for page translation on the - powerpc. It will also give you information about performance measurement - specific to the cpu that you are using. - - Explanation of the 604 Performance Monitoring Fields: - MMCR0 - the current value of the MMCR0 register - PMC1 - PMC2 - the value of the performance counters and a - description of what events they are counting - which are based on MMCR0 bit settings. - Explanation of the PTE Hash Table fields: - - Size - hash table size in Kb. - Buckets - number of buckets in the table. - Address - the virtual kernel address of the hash table base. - Entries - the number of ptes that can be stored in the hash table. - User/Kernel - how many pte's are in use by the kernel or user at that time. - Overflows - How many of the entries are in their secondary hash location. - Percent full - ratio of free pte entries to in use entries. - Reloads - Count of how many hash table misses have occurred - that were fixed with a reload from the linux tables. - Should always be 0 on 603 based machines. - Non-error Misses - Count of how many hash table misses have occurred - that were completed with the creation of a pte in the linux - tables with a call to do_page_fault(). - Error Misses - Number of misses due to errors such as bad address - and permission violations. This includes kernel access of - bad user addresses that are fixed up by the trap handler. - - Note that calculation of the data displayed from /proc/ppc_htab takes - a long time and spends a great deal of time in the kernel. It would - be quite hard on performance to read this file constantly. In time - there may be a counter in the kernel that allows successive reads from - this file only after a given amount of time has passed to reduce the - possibility of a user slowing the system by reading this file. - -2. Writing - - Writing to the ppc_htab allows you to change the characteristics of - the powerpc PTE hash table and setup performance monitoring. - - Resizing the PTE hash table is not enabled right now due to many - complications with moving the hash table, rehashing the entries - and many many SMP issues that would have to be dealt with. - - Write options to ppc_htab: - - - To set the size of the hash table to 64Kb: - - echo 'size 64' > /proc/ppc_htab - - The size must be a multiple of 64 and must be greater than or equal to - 64. - - - To turn off performance monitoring: - - echo 'off' > /proc/ppc_htab - - - To reset the counters without changing what they're counting: - - echo 'reset' > /proc/ppc_htab - - Note that counting will continue after the reset if it is enabled. - - - To count only events in user mode or only in kernel mode: - - echo 'user' > /proc/ppc_htab - ...or... - echo 'kernel' > /proc/ppc_htab - - Note that these two options are exclusive of one another and the - lack of either of these options counts user and kernel. - Using 'reset' and 'off' reset these flags. - - - The 604 has 2 performance counters which can each count events from - a specific set of events. These sets are disjoint so it is not - possible to count _any_ combination of 2 events. One event can - be counted by PMC1 and one by PMC2. - - To start counting a particular event use: - - echo 'event' > /proc/ppc_htab - - and choose from these events: - - PMC1 - ---- - 'ic miss' - instruction cache misses - 'dtlb' - data tlb misses (not hash table misses) - - PMC2 - ---- - 'dc miss' - data cache misses - 'itlb' - instruction tlb misses (not hash table misses) - 'load miss time' - cycles to complete a load miss - -3. Bugs - - The PMC1 and PMC2 counters can overflow and give no indication of that - in /proc/ppc_htab. diff --git a/Documentation/powerpc/ptrace.txt b/Documentation/powerpc/ptrace.txt new file mode 100644 index 00000000000..99c5ce88d0f --- /dev/null +++ b/Documentation/powerpc/ptrace.txt @@ -0,0 +1,151 @@ +GDB intends to support the following hardware debug features of BookE +processors: + +4 hardware breakpoints (IAC) +2 hardware watchpoints (read, write and read-write) (DAC) +2 value conditions for the hardware watchpoints (DVC) + +For that, we need to extend ptrace so that GDB can query and set these +resources. Since we're extending, we're trying to create an interface +that's extendable and that covers both BookE and server processors, so +that GDB doesn't need to special-case each of them. We added the +following 3 new ptrace requests. + +1. PTRACE_PPC_GETHWDEBUGINFO + +Query for GDB to discover the hardware debug features. The main info to +be returned here is the minimum alignment for the hardware watchpoints. +BookE processors don't have restrictions here, but server processors have +an 8-byte alignment restriction for hardware watchpoints. We'd like to avoid +adding special cases to GDB based on what it sees in AUXV. + +Since we're at it, we added other useful info that the kernel can return to +GDB: this query will return the number of hardware breakpoints, hardware +watchpoints and whether it supports a range of addresses and a condition. +The query will fill the following structure provided by the requesting process: + +struct ppc_debug_info { + unit32_t version; + unit32_t num_instruction_bps; + unit32_t num_data_bps; + unit32_t num_condition_regs; + unit32_t data_bp_alignment; + unit32_t sizeof_condition; /* size of the DVC register */ + uint64_t features; /* bitmask of the individual flags */ +}; + +features will have bits indicating whether there is support for: + +#define PPC_DEBUG_FEATURE_INSN_BP_RANGE 0x1 +#define PPC_DEBUG_FEATURE_INSN_BP_MASK 0x2 +#define PPC_DEBUG_FEATURE_DATA_BP_RANGE 0x4 +#define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x8 +#define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x10 + +2. PTRACE_SETHWDEBUG + +Sets a hardware breakpoint or watchpoint, according to the provided structure: + +struct ppc_hw_breakpoint { + uint32_t version; +#define PPC_BREAKPOINT_TRIGGER_EXECUTE 0x1 +#define PPC_BREAKPOINT_TRIGGER_READ 0x2 +#define PPC_BREAKPOINT_TRIGGER_WRITE 0x4 + uint32_t trigger_type; /* only some combinations allowed */ +#define PPC_BREAKPOINT_MODE_EXACT 0x0 +#define PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE 0x1 +#define PPC_BREAKPOINT_MODE_RANGE_EXCLUSIVE 0x2 +#define PPC_BREAKPOINT_MODE_MASK 0x3 + uint32_t addr_mode; /* address match mode */ + +#define PPC_BREAKPOINT_CONDITION_MODE 0x3 +#define PPC_BREAKPOINT_CONDITION_NONE 0x0 +#define PPC_BREAKPOINT_CONDITION_AND 0x1 +#define PPC_BREAKPOINT_CONDITION_EXACT 0x1 /* different name for the same thing as above */ +#define PPC_BREAKPOINT_CONDITION_OR 0x2 +#define PPC_BREAKPOINT_CONDITION_AND_OR 0x3 +#define PPC_BREAKPOINT_CONDITION_BE_ALL 0x00ff0000 /* byte enable bits */ +#define PPC_BREAKPOINT_CONDITION_BE(n) (1<<((n)+16)) + uint32_t condition_mode; /* break/watchpoint condition flags */ + + uint64_t addr; + uint64_t addr2; + uint64_t condition_value; +}; + +A request specifies one event, not necessarily just one register to be set. +For instance, if the request is for a watchpoint with a condition, both the +DAC and DVC registers will be set in the same request. + +With this GDB can ask for all kinds of hardware breakpoints and watchpoints +that the BookE supports. COMEFROM breakpoints available in server processors +are not contemplated, but that is out of the scope of this work. + +ptrace will return an integer (handle) uniquely identifying the breakpoint or +watchpoint just created. This integer will be used in the PTRACE_DELHWDEBUG +request to ask for its removal. Return -ENOSPC if the requested breakpoint +can't be allocated on the registers. + +Some examples of using the structure to: + +- set a breakpoint in the first breakpoint register + + p.version = PPC_DEBUG_CURRENT_VERSION; + p.trigger_type = PPC_BREAKPOINT_TRIGGER_EXECUTE; + p.addr_mode = PPC_BREAKPOINT_MODE_EXACT; + p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE; + p.addr = (uint64_t) address; + p.addr2 = 0; + p.condition_value = 0; + +- set a watchpoint which triggers on reads in the second watchpoint register + + p.version = PPC_DEBUG_CURRENT_VERSION; + p.trigger_type = PPC_BREAKPOINT_TRIGGER_READ; + p.addr_mode = PPC_BREAKPOINT_MODE_EXACT; + p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE; + p.addr = (uint64_t) address; + p.addr2 = 0; + p.condition_value = 0; + +- set a watchpoint which triggers only with a specific value + + p.version = PPC_DEBUG_CURRENT_VERSION; + p.trigger_type = PPC_BREAKPOINT_TRIGGER_READ; + p.addr_mode = PPC_BREAKPOINT_MODE_EXACT; + p.condition_mode = PPC_BREAKPOINT_CONDITION_AND | PPC_BREAKPOINT_CONDITION_BE_ALL; + p.addr = (uint64_t) address; + p.addr2 = 0; + p.condition_value = (uint64_t) condition; + +- set a ranged hardware breakpoint + + p.version = PPC_DEBUG_CURRENT_VERSION; + p.trigger_type = PPC_BREAKPOINT_TRIGGER_EXECUTE; + p.addr_mode = PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE; + p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE; + p.addr = (uint64_t) begin_range; + p.addr2 = (uint64_t) end_range; + p.condition_value = 0; + +- set a watchpoint in server processors (BookS) + + p.version = 1; + p.trigger_type = PPC_BREAKPOINT_TRIGGER_RW; + p.addr_mode = PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE; + or + p.addr_mode = PPC_BREAKPOINT_MODE_EXACT; + + p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE; + p.addr = (uint64_t) begin_range; + /* For PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE addr2 needs to be specified, where + * addr2 - addr <= 8 Bytes. + */ + p.addr2 = (uint64_t) end_range; + p.condition_value = 0; + +3. PTRACE_DELHWDEBUG + +Takes an integer which identifies an existing breakpoint or watchpoint +(i.e., the value returned from PTRACE_SETHWDEBUG), and deletes the +corresponding breakpoint or watchpoint.. diff --git a/Documentation/powerpc/qe_firmware.txt b/Documentation/powerpc/qe_firmware.txt index 896266432d3..2031ddb33d0 100644 --- a/Documentation/powerpc/qe_firmware.txt +++ b/Documentation/powerpc/qe_firmware.txt @@ -217,7 +217,7 @@ Although it is not recommended, you can specify '0' in the soc.model field to skip matching SOCs altogether. The 'model' field is a 16-bit number that matches the actual SOC. The -'major' and 'minor' fields are the major and minor revision numbrs, +'major' and 'minor' fields are the major and minor revision numbers, respectively, of the SOC. For example, to match the 8323, revision 1.0: @@ -225,7 +225,7 @@ For example, to match the 8323, revision 1.0: soc.major = 1 soc.minor = 0 -'padding' is neccessary for structure alignment. This field ensures that the +'padding' is necessary for structure alignment. This field ensures that the 'extended_modes' field is aligned on a 64-bit boundary. 'extended_modes' is a bitfield that defines special functionality which has an diff --git a/Documentation/powerpc/smp.txt b/Documentation/powerpc/smp.txt deleted file mode 100644 index 5b581b849ff..00000000000 --- a/Documentation/powerpc/smp.txt +++ /dev/null @@ -1,34 +0,0 @@ - Information about Linux/PPC SMP mode -===================================================================== - -This document and the related code was written by me -(Cort Dougan, cort@fsmlabs.com) please email me if you have questions, -comments or corrections. - -Last Change: 3.31.99 - -If you want to help by writing code or testing different hardware please -email me! - -1. State of Supported Hardware - - PowerSurge Architecture - tested on UMAX s900, Apple 9600 - The second processor on this machine boots up just fine and - enters its idle loop. Hopefully a completely working SMP kernel - on this machine will be done shortly. - - The code makes the assumption of only two processors. The changes - necessary to work with any number would not be overly difficult but - I don't have any machines with >2 processors so it's not high on my - list of priorities. If anyone else would like do to the work email - me and I can point out the places that need changed. If you have >2 - processors and don't want to add support yourself let me know and I - can take a look into it. - - BeBox - BeBox support hasn't been added to the 2.1.X kernels from 2.0.X - but work is being done and SMP support for BeBox is in the works. - - CHRP - CHRP SMP works and is fairly solid. It's been tested on the IBM F50 - with 4 processors for quite some time now. diff --git a/Documentation/powerpc/sound.txt b/Documentation/powerpc/sound.txt deleted file mode 100644 index df23d95e03a..00000000000 --- a/Documentation/powerpc/sound.txt +++ /dev/null @@ -1,81 +0,0 @@ - Information about PowerPC Sound support -===================================================================== - -Please mail me (Cort Dougan, cort@fsmlabs.com) if you have questions, -comments or corrections. - -Last Change: 6.16.99 - -This just covers sound on the PReP and CHRP systems for now and later -will contain information on the PowerMac's. - -Sound on PReP has been tested and is working with the PowerStack and IBM -Power Series onboard sound systems which are based on the cs4231(2) chip. -The sound options when doing the make config are a bit different from -the default, though. - -The I/O base, irq and dma lines that you enter during the make config -are ignored and are set when booting according to the machine type. -This is so that one binary can be used for Motorola and IBM machines -which use different values and isn't allowed by the driver, so things -are hacked together in such a way as to allow this information to be -set automatically on boot. - -1. Motorola PowerStack PReP machines - - Enable support for "Crystal CS4232 based (PnP) cards" and for the - Microsoft Sound System. The MSS isn't used, but some of the routines - that the CS4232 driver uses are in it. - - Although the options you set are ignored and determined automatically - on boot these are included for information only: - - (830) CS4232 audio I/O base 530, 604, E80 or F40 - (10) CS4232 audio IRQ 5, 7, 9, 11, 12 or 15 - (6) CS4232 audio DMA 0, 1 or 3 - (7) CS4232 second (duplex) DMA 0, 1 or 3 - - This will allow simultaneous record and playback, as 2 different dma - channels are used. - - The sound will be all left channel and very low volume since the - auxiliary input isn't muted by default. I had the changes necessary - for this in the kernel but the sound driver maintainer didn't want - to include them since it wasn't common in other machines. To fix this - you need to mute it using a mixer utility of some sort (if you find one - please let me know) or by patching the driver yourself and recompiling. - - There is a problem on the PowerStack 2's (PowerStack Pro's) using a - different irq/drq than the kernel expects. Unfortunately, I don't know - which irq/drq it is so if anyone knows please email me. - - Midi is not supported since the cs4232 driver doesn't support midi yet. - -2. IBM PowerPersonal PReP machines - - I've only tested sound on the Power Personal Series of IBM workstations - so if you try it on others please let me know the result. I'm especially - interested in the 43p's sound system, which I know nothing about. - - Enable support for "Crystal CS4232 based (PnP) cards" and for the - Microsoft Sound System. The MSS isn't used, but some of the routines - that the CS4232 driver uses are in it. - - Although the options you set are ignored and determined automatically - on boot these are included for information only: - - (530) CS4232 audio I/O base 530, 604, E80 or F40 - (5) CS4232 audio IRQ 5, 7, 9, 11, 12 or 15 - (1) CS4232 audio DMA 0, 1 or 3 - (7) CS4232 second (duplex) DMA 0, 1 or 3 - (330) CS4232 MIDI I/O base 330, 370, 3B0 or 3F0 - (9) CS4232 MIDI IRQ 5, 7, 9, 11, 12 or 15 - - This setup does _NOT_ allow for recording yet. - - Midi is not supported since the cs4232 driver doesn't support midi yet. - -2. IBM CHRP - - I have only tested this on the 43P-150. Build the kernel with the cs4232 - set as a module and load the module with irq=9 dma=1 dma2=2 io=0x550 diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt new file mode 100644 index 00000000000..9791e98ab49 --- /dev/null +++ b/Documentation/powerpc/transactional_memory.txt @@ -0,0 +1,198 @@ +Transactional Memory support +============================ + +POWER kernel support for this feature is currently limited to supporting +its use by user programs. It is not currently used by the kernel itself. + +This file aims to sum up how it is supported by Linux and what behaviour you +can expect from your user programs. + + +Basic overview +============== + +Hardware Transactional Memory is supported on POWER8 processors, and is a +feature that enables a different form of atomic memory access. Several new +instructions are presented to delimit transactions; transactions are +guaranteed to either complete atomically or roll back and undo any partial +changes. + +A simple transaction looks like this: + +begin_move_money: + tbegin + beq abort_handler + + ld r4, SAVINGS_ACCT(r3) + ld r5, CURRENT_ACCT(r3) + subi r5, r5, 1 + addi r4, r4, 1 + std r4, SAVINGS_ACCT(r3) + std r5, CURRENT_ACCT(r3) + + tend + + b continue + +abort_handler: + ... test for odd failures ... + + /* Retry the transaction if it failed because it conflicted with + * someone else: */ + b begin_move_money + + +The 'tbegin' instruction denotes the start point, and 'tend' the end point. +Between these points the processor is in 'Transactional' state; any memory +references will complete in one go if there are no conflicts with other +transactional or non-transactional accesses within the system. In this +example, the transaction completes as though it were normal straight-line code +IF no other processor has touched SAVINGS_ACCT(r3) or CURRENT_ACCT(r3); an +atomic move of money from the current account to the savings account has been +performed. Even though the normal ld/std instructions are used (note no +lwarx/stwcx), either *both* SAVINGS_ACCT(r3) and CURRENT_ACCT(r3) will be +updated, or neither will be updated. + +If, in the meantime, there is a conflict with the locations accessed by the +transaction, the transaction will be aborted by the CPU. Register and memory +state will roll back to that at the 'tbegin', and control will continue from +'tbegin+4'. The branch to abort_handler will be taken this second time; the +abort handler can check the cause of the failure, and retry. + +Checkpointed registers include all GPRs, FPRs, VRs/VSRs, LR, CCR/CR, CTR, FPCSR +and a few other status/flag regs; see the ISA for details. + +Causes of transaction aborts +============================ + +- Conflicts with cache lines used by other processors +- Signals +- Context switches +- See the ISA for full documentation of everything that will abort transactions. + + +Syscalls +======== + +Performing syscalls from within transaction is not recommended, and can lead +to unpredictable results. + +Syscalls do not by design abort transactions, but beware: The kernel code will +not be running in transactional state. The effect of syscalls will always +remain visible, but depending on the call they may abort your transaction as a +side-effect, read soon-to-be-aborted transactional data that should not remain +invisible, etc. If you constantly retry a transaction that constantly aborts +itself by calling a syscall, you'll have a livelock & make no progress. + +Simple syscalls (e.g. sigprocmask()) "could" be OK. Even things like write() +from, say, printf() should be OK as long as the kernel does not access any +memory that was accessed transactionally. + +Consider any syscalls that happen to work as debug-only -- not recommended for +production use. Best to queue them up till after the transaction is over. + + +Signals +======= + +Delivery of signals (both sync and async) during transactions provides a second +thread state (ucontext/mcontext) to represent the second transactional register +state. Signal delivery 'treclaim's to capture both register states, so signals +abort transactions. The usual ucontext_t passed to the signal handler +represents the checkpointed/original register state; the signal appears to have +arisen at 'tbegin+4'. + +If the sighandler ucontext has uc_link set, a second ucontext has been +delivered. For future compatibility the MSR.TS field should be checked to +determine the transactional state -- if so, the second ucontext in uc->uc_link +represents the active transactional registers at the point of the signal. + +For 64-bit processes, uc->uc_mcontext.regs->msr is a full 64-bit MSR and its TS +field shows the transactional mode. + +For 32-bit processes, the mcontext's MSR register is only 32 bits; the top 32 +bits are stored in the MSR of the second ucontext, i.e. in +uc->uc_link->uc_mcontext.regs->msr. The top word contains the transactional +state TS. + +However, basic signal handlers don't need to be aware of transactions +and simply returning from the handler will deal with things correctly: + +Transaction-aware signal handlers can read the transactional register state +from the second ucontext. This will be necessary for crash handlers to +determine, for example, the address of the instruction causing the SIGSEGV. + +Example signal handler: + + void crash_handler(int sig, siginfo_t *si, void *uc) + { + ucontext_t *ucp = uc; + ucontext_t *transactional_ucp = ucp->uc_link; + + if (ucp_link) { + u64 msr = ucp->uc_mcontext.regs->msr; + /* May have transactional ucontext! */ +#ifndef __powerpc64__ + msr |= ((u64)transactional_ucp->uc_mcontext.regs->msr) << 32; +#endif + if (MSR_TM_ACTIVE(msr)) { + /* Yes, we crashed during a transaction. Oops. */ + fprintf(stderr, "Transaction to be restarted at 0x%llx, but " + "crashy instruction was at 0x%llx\n", + ucp->uc_mcontext.regs->nip, + transactional_ucp->uc_mcontext.regs->nip); + } + } + + fix_the_problem(ucp->dar); + } + +When in an active transaction that takes a signal, we need to be careful with +the stack. It's possible that the stack has moved back up after the tbegin. +The obvious case here is when the tbegin is called inside a function that +returns before a tend. In this case, the stack is part of the checkpointed +transactional memory state. If we write over this non transactionally or in +suspend, we are in trouble because if we get a tm abort, the program counter and +stack pointer will be back at the tbegin but our in memory stack won't be valid +anymore. + +To avoid this, when taking a signal in an active transaction, we need to use +the stack pointer from the checkpointed state, rather than the speculated +state. This ensures that the signal context (written tm suspended) will be +written below the stack required for the rollback. The transaction is aborted +because of the treclaim, so any memory written between the tbegin and the +signal will be rolled back anyway. + +For signals taken in non-TM or suspended mode, we use the +normal/non-checkpointed stack pointer. + + +Failure cause codes used by kernel +================================== + +These are defined in <asm/reg.h>, and distinguish different reasons why the +kernel aborted a transaction: + + TM_CAUSE_RESCHED Thread was rescheduled. + TM_CAUSE_TLBI Software TLB invalide. + TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap. + TM_CAUSE_SYSCALL Currently unused; future syscalls that must abort + transactions for consistency will use this. + TM_CAUSE_SIGNAL Signal delivered. + TM_CAUSE_MISC Currently unused. + TM_CAUSE_ALIGNMENT Alignment fault. + TM_CAUSE_EMULATE Emulation that touched memory. + +These can be checked by the user program's abort handler as TEXASR[0:7]. If +bit 7 is set, it indicates that the error is consider persistent. For example +a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.q + +GDB +=== + +GDB and ptrace are not currently TM-aware. If one stops during a transaction, +it looks like the transaction has just started (the checkpointed state is +presented). The transaction cannot then be continued and will take the failure +handler route. Furthermore, the transactional 2nd register state will be +inaccessible. GDB can currently be used on programs using TM, but not sensibly +in parts within transactions. diff --git a/Documentation/powerpc/zImage_layout.txt b/Documentation/powerpc/zImage_layout.txt deleted file mode 100644 index 048e0150f57..00000000000 --- a/Documentation/powerpc/zImage_layout.txt +++ /dev/null @@ -1,47 +0,0 @@ - Information about the Linux/PPC kernel images -===================================================================== - -Please mail me (Cort Dougan, cort@fsmlabs.com) if you have questions, -comments or corrections. - -This document is meant to answer several questions I've had about how -the PReP system boots and how Linux/PPC interacts with that mechanism. -It would be nice if we could have information on how other architectures -boot here as well. If you have anything to contribute, please -let me know. - - -1. PReP boot file - - This is the file necessary to boot PReP systems from floppy or - hard drive. The firmware reads the PReP partition table entry - and will load the image accordingly. - - To boot the zImage, copy it onto a floppy with dd if=zImage of=/dev/fd0h1440 - or onto a PReP hard drive partition with dd if=zImage of=/dev/sda4 - assuming you've created a PReP partition (type 0x41) with fdisk on - /dev/sda4. - - The layout of the image format is: - - 0x0 +------------+ - | | PReP partition table entry - | | - 0x400 +------------+ - | | Bootstrap program code + data - | | - | | - +------------+ - | | compressed kernel, elf header removed - +------------+ - | | initrd (if loaded) - +------------+ - | | Elf section table for bootstrap program - +------------+ - - -2. MBX boot file - - The MBX boards can load an elf image, and relocate it to the - proper location in memory - it copies the image to the location it was - linked at. |
