aboutsummaryrefslogtreecommitdiff
path: root/Documentation/powerpc
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/powerpc')
-rw-r--r--Documentation/powerpc/00-INDEX17
-rw-r--r--Documentation/powerpc/cpu_families.txt221
-rw-r--r--Documentation/powerpc/cpu_features.txt10
-rw-r--r--Documentation/powerpc/firmware-assisted-dump.txt270
-rw-r--r--Documentation/powerpc/mpc52xx.txt12
-rw-r--r--Documentation/powerpc/phyp-assisted-dump.txt127
-rw-r--r--Documentation/powerpc/pmu-ebb.txt137
-rw-r--r--Documentation/powerpc/ptrace.txt17
-rw-r--r--Documentation/powerpc/sound.txt81
-rw-r--r--Documentation/powerpc/transactional_memory.txt198
-rw-r--r--Documentation/powerpc/zImage_layout.txt47
11 files changed, 867 insertions, 270 deletions
diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX
index 5620fb5ac42..6db73df0427 100644
--- a/Documentation/powerpc/00-INDEX
+++ b/Documentation/powerpc/00-INDEX
@@ -5,19 +5,28 @@ please mail me.
00-INDEX
- this file
+bootwrapper.txt
+ - Information on how the powerpc kernel is wrapped for boot on various
+ different platforms.
cpu_features.txt
- info on how we support a variety of CPUs with minimal compile-time
options.
eeh-pci-error-recovery.txt
- info on PCI Bus EEH Error Recovery
+firmware-assisted-dump.txt
+ - Documentation on the firmware assisted dump mechanism "fadump".
hvcs.txt
- IBM "Hypervisor Virtual Console Server" Installation Guide
+kvm_440.txt
+ - Various notes on the implementation of KVM for PowerPC 440.
mpc52xx.txt
- Linux 2.6.x on MPC52xx family
-sound.txt
- - info on sound support under Linux/PPC
-zImage_layout.txt
- - info on the kernel images for Linux/PPC
+pmu-ebb.txt
+ - Description of the API for using the PMU with Event Based Branches.
qe_firmware.txt
- describes the layout of firmware binaries for the Freescale QUICC
Engine and the code that parses and uploads the microcode therein.
+ptrace.txt
+ - Information on the ptrace interfaces for hardware debug registers.
+transactional_memory.txt
+ - Overview of the Power8 transactional memory support.
diff --git a/Documentation/powerpc/cpu_families.txt b/Documentation/powerpc/cpu_families.txt
new file mode 100644
index 00000000000..fc08e22feb1
--- /dev/null
+++ b/Documentation/powerpc/cpu_families.txt
@@ -0,0 +1,221 @@
+CPU Families
+============
+
+This document tries to summarise some of the different cpu families that exist
+and are supported by arch/powerpc.
+
+
+Book3S (aka sPAPR)
+------------------
+
+ - Hash MMU
+ - Mix of 32 & 64 bit
+
+ +--------------+ +----------------+
+ | Old POWER | --------------> | RS64 (threads) |
+ +--------------+ +----------------+
+ |
+ |
+ v
+ +--------------+ +----------------+ +------+
+ | 601 | --------------> | 603 | ---> | e300 |
+ +--------------+ +----------------+ +------+
+ | |
+ | |
+ v v
+ +--------------+ +----------------+ +-------+
+ | 604 | | 750 (G3) | ---> | 750CX |
+ +--------------+ +----------------+ +-------+
+ | | |
+ | | |
+ v v v
+ +--------------+ +----------------+ +-------+
+ | 620 (64 bit) | | 7400 | | 750CL |
+ +--------------+ +----------------+ +-------+
+ | | |
+ | | |
+ v v v
+ +--------------+ +----------------+ +-------+
+ | POWER3/630 | | 7410 | | 750FX |
+ +--------------+ +----------------+ +-------+
+ | |
+ | |
+ v v
+ +--------------+ +----------------+
+ | POWER3+ | | 7450 |
+ +--------------+ +----------------+
+ | |
+ | |
+ v v
+ +--------------+ +----------------+
+ | POWER4 | | 7455 |
+ +--------------+ +----------------+
+ | |
+ | |
+ v v
+ +--------------+ +-------+ +----------------+
+ | POWER4+ | --> | 970 | | 7447 |
+ +--------------+ +-------+ +----------------+
+ | | |
+ | | |
+ v v v
+ +--------------+ +-------+ +----------------+
+ | POWER5 | | 970FX | | 7448 |
+ +--------------+ +-------+ +----------------+
+ | | |
+ | | |
+ v v v
+ +--------------+ +-------+ +----------------+
+ | POWER5+ | | 970MP | | e600 |
+ +--------------+ +-------+ +----------------+
+ |
+ |
+ v
+ +--------------+
+ | POWER5++ |
+ +--------------+
+ |
+ |
+ v
+ +--------------+ +-------+
+ | POWER6 | <-?-> | Cell |
+ +--------------+ +-------+
+ |
+ |
+ v
+ +--------------+
+ | POWER7 |
+ +--------------+
+ |
+ |
+ v
+ +--------------+
+ | POWER7+ |
+ +--------------+
+ |
+ |
+ v
+ +--------------+
+ | POWER8 |
+ +--------------+
+
+
+ +---------------+
+ | PA6T (64 bit) |
+ +---------------+
+
+
+IBM BookE
+---------
+
+ - Software loaded TLB.
+ - All 32 bit
+
+ +--------------+
+ | 401 |
+ +--------------+
+ |
+ |
+ v
+ +--------------+
+ | 403 |
+ +--------------+
+ |
+ |
+ v
+ +--------------+
+ | 405 |
+ +--------------+
+ |
+ |
+ v
+ +--------------+
+ | 440 |
+ +--------------+
+ |
+ |
+ v
+ +--------------+ +----------------+
+ | 450 | --> | BG/P |
+ +--------------+ +----------------+
+ |
+ |
+ v
+ +--------------+
+ | 460 |
+ +--------------+
+ |
+ |
+ v
+ +--------------+
+ | 476 |
+ +--------------+
+
+
+Motorola/Freescale 8xx
+----------------------
+
+ - Software loaded with hardware assist.
+ - All 32 bit
+
+ +-------------+
+ | MPC8xx Core |
+ +-------------+
+
+
+Freescale BookE
+---------------
+
+ - Software loaded TLB.
+ - e6500 adds HW loaded indirect TLB entries.
+ - Mix of 32 & 64 bit
+
+ +--------------+
+ | e200 |
+ +--------------+
+
+
+ +--------------------------------+
+ | e500 |
+ +--------------------------------+
+ |
+ |
+ v
+ +--------------------------------+
+ | e500v2 |
+ +--------------------------------+
+ |
+ |
+ v
+ +--------------------------------+
+ | e500mc (Book3e) |
+ +--------------------------------+
+ |
+ |
+ v
+ +--------------------------------+
+ | e5500 (64 bit) |
+ +--------------------------------+
+ |
+ |
+ v
+ +--------------------------------+
+ | e6500 (HW TLB) (Multithreaded) |
+ +--------------------------------+
+
+
+IBM A2 core
+-----------
+
+ - Book3E, software loaded TLB + HW loaded indirect TLB entries.
+ - 64 bit
+
+ +--------------+ +----------------+
+ | A2 core | --> | WSP |
+ +--------------+ +----------------+
+ |
+ |
+ v
+ +--------------+
+ | BG/Q |
+ +--------------+
diff --git a/Documentation/powerpc/cpu_features.txt b/Documentation/powerpc/cpu_features.txt
index ffa4183fdb8..ae09df8722c 100644
--- a/Documentation/powerpc/cpu_features.txt
+++ b/Documentation/powerpc/cpu_features.txt
@@ -11,10 +11,10 @@ split instruction and data caches, and if the CPU supports the DOZE and NAP
sleep modes.
Detection of the feature set is simple. A list of processors can be found in
-arch/ppc/kernel/cputable.c. The PVR register is masked and compared with each
-value in the list. If a match is found, the cpu_features of cur_cpu_spec is
-assigned to the feature bitmask for this processor and a __setup_cpu function
-is called.
+arch/powerpc/kernel/cputable.c. The PVR register is masked and compared with
+each value in the list. If a match is found, the cpu_features of cur_cpu_spec
+is assigned to the feature bitmask for this processor and a __setup_cpu
+function is called.
C code may test 'cur_cpu_spec[smp_processor_id()]->cpu_features' for a
particular feature bit. This is done in quite a few places, for example
@@ -51,6 +51,6 @@ should be used in the majority of cases.
The END_FTR_SECTION macros are implemented by storing information about this
code in the '__ftr_fixup' ELF section. When do_cpu_ftr_fixups
-(arch/ppc/kernel/misc.S) is invoked, it will iterate over the records in
+(arch/powerpc/kernel/misc.S) is invoked, it will iterate over the records in
__ftr_fixup, and if the required feature is not present it will loop writing
nop's from each BEGIN_FTR_SECTION to END_FTR_SECTION.
diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt
new file mode 100644
index 00000000000..3007bc98af2
--- /dev/null
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -0,0 +1,270 @@
+
+ Firmware-Assisted Dump
+ ------------------------
+ July 2011
+
+The goal of firmware-assisted dump is to enable the dump of
+a crashed system, and to do so from a fully-reset system, and
+to minimize the total elapsed time until the system is back
+in production use.
+
+- Firmware assisted dump (fadump) infrastructure is intended to replace
+ the existing phyp assisted dump.
+- Fadump uses the same firmware interfaces and memory reservation model
+ as phyp assisted dump.
+- Unlike phyp dump, fadump exports the memory dump through /proc/vmcore
+ in the ELF format in the same way as kdump. This helps us reuse the
+ kdump infrastructure for dump capture and filtering.
+- Unlike phyp dump, userspace tool does not need to refer any sysfs
+ interface while reading /proc/vmcore.
+- Unlike phyp dump, fadump allows user to release all the memory reserved
+ for dump, with a single operation of echo 1 > /sys/kernel/fadump_release_mem.
+- Once enabled through kernel boot parameter, fadump can be
+ started/stopped through /sys/kernel/fadump_registered interface (see
+ sysfs files section below) and can be easily integrated with kdump
+ service start/stop init scripts.
+
+Comparing with kdump or other strategies, firmware-assisted
+dump offers several strong, practical advantages:
+
+-- Unlike kdump, the system has been reset, and loaded
+ with a fresh copy of the kernel. In particular,
+ PCI and I/O devices have been reinitialized and are
+ in a clean, consistent state.
+-- Once the dump is copied out, the memory that held the dump
+ is immediately available to the running kernel. And therefore,
+ unlike kdump, fadump doesn't need a 2nd reboot to get back
+ the system to the production configuration.
+
+The above can only be accomplished by coordination with,
+and assistance from the Power firmware. The procedure is
+as follows:
+
+-- The first kernel registers the sections of memory with the
+ Power firmware for dump preservation during OS initialization.
+ These registered sections of memory are reserved by the first
+ kernel during early boot.
+
+-- When a system crashes, the Power firmware will save
+ the low memory (boot memory of size larger of 5% of system RAM
+ or 256MB) of RAM to the previous registered region. It will
+ also save system registers, and hardware PTE's.
+
+ NOTE: The term 'boot memory' means size of the low memory chunk
+ that is required for a kernel to boot successfully when
+ booted with restricted memory. By default, the boot memory
+ size will be the larger of 5% of system RAM or 256MB.
+ Alternatively, user can also specify boot memory size
+ through boot parameter 'fadump_reserve_mem=' which will
+ override the default calculated size. Use this option
+ if default boot memory size is not sufficient for second
+ kernel to boot successfully.
+
+-- After the low memory (boot memory) area has been saved, the
+ firmware will reset PCI and other hardware state. It will
+ *not* clear the RAM. It will then launch the bootloader, as
+ normal.
+
+-- The freshly booted kernel will notice that there is a new
+ node (ibm,dump-kernel) in the device tree, indicating that
+ there is crash data available from a previous boot. During
+ the early boot OS will reserve rest of the memory above
+ boot memory size effectively booting with restricted memory
+ size. This will make sure that the second kernel will not
+ touch any of the dump memory area.
+
+-- User-space tools will read /proc/vmcore to obtain the contents
+ of memory, which holds the previous crashed kernel dump in ELF
+ format. The userspace tools may copy this info to disk, or
+ network, nas, san, iscsi, etc. as desired.
+
+-- Once the userspace tool is done saving dump, it will echo
+ '1' to /sys/kernel/fadump_release_mem to release the reserved
+ memory back to general use, except the memory required for
+ next firmware-assisted dump registration.
+
+ e.g.
+ # echo 1 > /sys/kernel/fadump_release_mem
+
+Please note that the firmware-assisted dump feature
+is only available on Power6 and above systems with recent
+firmware versions.
+
+Implementation details:
+----------------------
+
+During boot, a check is made to see if firmware supports
+this feature on that particular machine. If it does, then
+we check to see if an active dump is waiting for us. If yes
+then everything but boot memory size of RAM is reserved during
+early boot (See Fig. 2). This area is released once we finish
+collecting the dump from user land scripts (e.g. kdump scripts)
+that are run. If there is dump data, then the
+/sys/kernel/fadump_release_mem file is created, and the reserved
+memory is held.
+
+If there is no waiting dump data, then only the memory required
+to hold CPU state, HPTE region, boot memory dump and elfcore
+header, is reserved at the top of memory (see Fig. 1). This area
+is *not* released: this region will be kept permanently reserved,
+so that it can act as a receptacle for a copy of the boot memory
+content in addition to CPU state and HPTE region, in the case a
+crash does occur.
+
+ o Memory Reservation during first kernel
+
+ Low memory Top of memory
+ 0 boot memory size |
+ | | |<--Reserved dump area -->|
+ V V | Permanent Reservation V
+ +-----------+----------/ /----------+---+----+-----------+----+
+ | | |CPU|HPTE| DUMP |ELF |
+ +-----------+----------/ /----------+---+----+-----------+----+
+ | ^
+ | |
+ \ /
+ -------------------------------------------
+ Boot memory content gets transferred to
+ reserved area by firmware at the time of
+ crash
+ Fig. 1
+
+ o Memory Reservation during second kernel after crash
+
+ Low memory Top of memory
+ 0 boot memory size |
+ | |<------------- Reserved dump area ----------- -->|
+ V V V
+ +-----------+----------/ /----------+---+----+-----------+----+
+ | | |CPU|HPTE| DUMP |ELF |
+ +-----------+----------/ /----------+---+----+-----------+----+
+ | |
+ V V
+ Used by second /proc/vmcore
+ kernel to boot
+ Fig. 2
+
+Currently the dump will be copied from /proc/vmcore to a
+a new file upon user intervention. The dump data available through
+/proc/vmcore will be in ELF format. Hence the existing kdump
+infrastructure (kdump scripts) to save the dump works fine with
+minor modifications.
+
+The tools to examine the dump will be same as the ones
+used for kdump.
+
+How to enable firmware-assisted dump (fadump):
+-------------------------------------
+
+1. Set config option CONFIG_FA_DUMP=y and build kernel.
+2. Boot into linux kernel with 'fadump=on' kernel cmdline option.
+3. Optionally, user can also set 'fadump_reserve_mem=' kernel cmdline
+ to specify size of the memory to reserve for boot memory dump
+ preservation.
+
+NOTE: If firmware-assisted dump fails to reserve memory then it will
+ fallback to existing kdump mechanism if 'crashkernel=' option
+ is set at kernel cmdline.
+
+Sysfs/debugfs files:
+------------
+
+Firmware-assisted dump feature uses sysfs file system to hold
+the control files and debugfs file to display memory reserved region.
+
+Here is the list of files under kernel sysfs:
+
+ /sys/kernel/fadump_enabled
+
+ This is used to display the fadump status.
+ 0 = fadump is disabled
+ 1 = fadump is enabled
+
+ This interface can be used by kdump init scripts to identify if
+ fadump is enabled in the kernel and act accordingly.
+
+ /sys/kernel/fadump_registered
+
+ This is used to display the fadump registration status as well
+ as to control (start/stop) the fadump registration.
+ 0 = fadump is not registered.
+ 1 = fadump is registered and ready to handle system crash.
+
+ To register fadump echo 1 > /sys/kernel/fadump_registered and
+ echo 0 > /sys/kernel/fadump_registered for un-register and stop the
+ fadump. Once the fadump is un-registered, the system crash will not
+ be handled and vmcore will not be captured. This interface can be
+ easily integrated with kdump service start/stop.
+
+ /sys/kernel/fadump_release_mem
+
+ This file is available only when fadump is active during
+ second kernel. This is used to release the reserved memory
+ region that are held for saving crash dump. To release the
+ reserved memory echo 1 to it:
+
+ echo 1 > /sys/kernel/fadump_release_mem
+
+ After echo 1, the content of the /sys/kernel/debug/powerpc/fadump_region
+ file will change to reflect the new memory reservations.
+
+ The existing userspace tools (kdump infrastructure) can be easily
+ enhanced to use this interface to release the memory reserved for
+ dump and continue without 2nd reboot.
+
+Here is the list of files under powerpc debugfs:
+(Assuming debugfs is mounted on /sys/kernel/debug directory.)
+
+ /sys/kernel/debug/powerpc/fadump_region
+
+ This file shows the reserved memory regions if fadump is
+ enabled otherwise this file is empty. The output format
+ is:
+ <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size>
+
+ e.g.
+ Contents when fadump is registered during first kernel
+
+ # cat /sys/kernel/debug/powerpc/fadump_region
+ CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0
+ HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0
+ DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0
+
+ Contents when fadump is active during second kernel
+
+ # cat /sys/kernel/debug/powerpc/fadump_region
+ CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020
+ HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x1000
+ DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000
+ : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000
+
+NOTE: Please refer to Documentation/filesystems/debugfs.txt on
+ how to mount the debugfs filesystem.
+
+
+TODO:
+-----
+ o Need to come up with the better approach to find out more
+ accurate boot memory size that is required for a kernel to
+ boot successfully when booted with restricted memory.
+ o The fadump implementation introduces a fadump crash info structure
+ in the scratch area before the ELF core header. The idea of introducing
+ this structure is to pass some important crash info data to the second
+ kernel which will help second kernel to populate ELF core header with
+ correct data before it gets exported through /proc/vmcore. The current
+ design implementation does not address a possibility of introducing
+ additional fields (in future) to this structure without affecting
+ compatibility. Need to come up with the better approach to address this.
+ The possible approaches are:
+ 1. Introduce version field for version tracking, bump up the version
+ whenever a new field is added to the structure in future. The version
+ field can be used to find out what fields are valid for the current
+ version of the structure.
+ 2. Reserve the area of predefined size (say PAGE_SIZE) for this
+ structure and have unused area as reserved (initialized to zero)
+ for future field additions.
+ The advantage of approach 1 over 2 is we don't need to reserve extra space.
+---
+Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
+This document is based on the original documentation written for phyp
+assisted dump by Linas Vepstas and Manish Ahuja.
diff --git a/Documentation/powerpc/mpc52xx.txt b/Documentation/powerpc/mpc52xx.txt
index 10dd4ab93b8..0d540a31ea1 100644
--- a/Documentation/powerpc/mpc52xx.txt
+++ b/Documentation/powerpc/mpc52xx.txt
@@ -2,7 +2,7 @@ Linux 2.6.x on MPC52xx family
-----------------------------
For the latest info, go to http://www.246tNt.com/mpc52xx/
-
+
To compile/use :
- U-Boot:
@@ -10,23 +10,23 @@ To compile/use :
if you wish to ).
# make lite5200_defconfig
# make uImage
-
+
then, on U-boot:
=> tftpboot 200000 uImage
=> tftpboot 400000 pRamdisk
=> bootm 200000 400000
-
+
- DBug:
# <edit Makefile to set ARCH=ppc & CROSS_COMPILE=... ( also EXTRAVERSION
if you wish to ).
# make lite5200_defconfig
# cp your_initrd.gz arch/ppc/boot/images/ramdisk.image.gz
- # make zImage.initrd
- # make
+ # make zImage.initrd
+ # make
then in DBug:
DBug> dn -i zImage.initrd.lite5200
-
+
Some remarks :
- The port is named mpc52xxx, and config options are PPC_MPC52xx. The MGT5100
diff --git a/Documentation/powerpc/phyp-assisted-dump.txt b/Documentation/powerpc/phyp-assisted-dump.txt
deleted file mode 100644
index ad340205d96..00000000000
--- a/Documentation/powerpc/phyp-assisted-dump.txt
+++ /dev/null
@@ -1,127 +0,0 @@
-
- Hypervisor-Assisted Dump
- ------------------------
- November 2007
-
-The goal of hypervisor-assisted dump is to enable the dump of
-a crashed system, and to do so from a fully-reset system, and
-to minimize the total elapsed time until the system is back
-in production use.
-
-As compared to kdump or other strategies, hypervisor-assisted
-dump offers several strong, practical advantages:
-
--- Unlike kdump, the system has been reset, and loaded
- with a fresh copy of the kernel. In particular,
- PCI and I/O devices have been reinitialized and are
- in a clean, consistent state.
--- As the dump is performed, the dumped memory becomes
- immediately available to the system for normal use.
--- After the dump is completed, no further reboots are
- required; the system will be fully usable, and running
- in its normal, production mode on its normal kernel.
-
-The above can only be accomplished by coordination with,
-and assistance from the hypervisor. The procedure is
-as follows:
-
--- When a system crashes, the hypervisor will save
- the low 256MB of RAM to a previously registered
- save region. It will also save system state, system
- registers, and hardware PTE's.
-
--- After the low 256MB area has been saved, the
- hypervisor will reset PCI and other hardware state.
- It will *not* clear RAM. It will then launch the
- bootloader, as normal.
-
--- The freshly booted kernel will notice that there
- is a new node (ibm,dump-kernel) in the device tree,
- indicating that there is crash data available from
- a previous boot. It will boot into only 256MB of RAM,
- reserving the rest of system memory.
-
--- Userspace tools will parse /sys/kernel/release_region
- and read /proc/vmcore to obtain the contents of memory,
- which holds the previous crashed kernel. The userspace
- tools may copy this info to disk, or network, nas, san,
- iscsi, etc. as desired.
-
- For Example: the values in /sys/kernel/release-region
- would look something like this (address-range pairs).
- CPU:0x177fee000-0x10000: HPTE:0x177ffe020-0x1000: /
- DUMP:0x177fff020-0x10000000, 0x10000000-0x16F1D370A
-
--- As the userspace tools complete saving a portion of
- dump, they echo an offset and size to
- /sys/kernel/release_region to release the reserved
- memory back to general use.
-
- An example of this is:
- "echo 0x40000000 0x10000000 > /sys/kernel/release_region"
- which will release 256MB at the 1GB boundary.
-
-Please note that the hypervisor-assisted dump feature
-is only available on Power6-based systems with recent
-firmware versions.
-
-Implementation details:
-----------------------
-
-During boot, a check is made to see if firmware supports
-this feature on this particular machine. If it does, then
-we check to see if a active dump is waiting for us. If yes
-then everything but 256 MB of RAM is reserved during early
-boot. This area is released once we collect a dump from user
-land scripts that are run. If there is dump data, then
-the /sys/kernel/release_region file is created, and
-the reserved memory is held.
-
-If there is no waiting dump data, then only the highest
-256MB of the ram is reserved as a scratch area. This area
-is *not* released: this region will be kept permanently
-reserved, so that it can act as a receptacle for a copy
-of the low 256MB in the case a crash does occur. See,
-however, "open issues" below, as to whether
-such a reserved region is really needed.
-
-Currently the dump will be copied from /proc/vmcore to a
-a new file upon user intervention. The starting address
-to be read and the range for each data point in provided
-in /sys/kernel/release_region.
-
-The tools to examine the dump will be same as the ones
-used for kdump.
-
-General notes:
---------------
-Security: please note that there are potential security issues
-with any sort of dump mechanism. In particular, plaintext
-(unencrypted) data, and possibly passwords, may be present in
-the dump data. Userspace tools must take adequate precautions to
-preserve security.
-
-Open issues/ToDo:
-------------
- o The various code paths that tell the hypervisor that a crash
- occurred, vs. it simply being a normal reboot, should be
- reviewed, and possibly clarified/fixed.
-
- o Instead of using /sys/kernel, should there be a /sys/dump
- instead? There is a dump_subsys being created by the s390 code,
- perhaps the pseries code should use a similar layout as well.
-
- o Is reserving a 256MB region really required? The goal of
- reserving a 256MB scratch area is to make sure that no
- important crash data is clobbered when the hypervisor
- save low mem to the scratch area. But, if one could assure
- that nothing important is located in some 256MB area, then
- it would not need to be reserved. Something that can be
- improved in subsequent versions.
-
- o Still working the kdump team to integrate this with kdump,
- some work remains but this would not affect the current
- patches.
-
- o Still need to write a shell script, to copy the dump away.
- Currently I am parsing it manually.
diff --git a/Documentation/powerpc/pmu-ebb.txt b/Documentation/powerpc/pmu-ebb.txt
new file mode 100644
index 00000000000..73cd163dbfb
--- /dev/null
+++ b/Documentation/powerpc/pmu-ebb.txt
@@ -0,0 +1,137 @@
+PMU Event Based Branches
+========================
+
+Event Based Branches (EBBs) are a feature which allows the hardware to
+branch directly to a specified user space address when certain events occur.
+
+The full specification is available in Power ISA v2.07:
+
+ https://www.power.org/documentation/power-isa-version-2-07/
+
+One type of event for which EBBs can be configured is PMU exceptions. This
+document describes the API for configuring the Power PMU to generate EBBs,
+using the Linux perf_events API.
+
+
+Terminology
+-----------
+
+Throughout this document we will refer to an "EBB event" or "EBB events". This
+just refers to a struct perf_event which has set the "EBB" flag in its
+attr.config. All events which can be configured on the hardware PMU are
+possible "EBB events".
+
+
+Background
+----------
+
+When a PMU EBB occurs it is delivered to the currently running process. As such
+EBBs can only sensibly be used by programs for self-monitoring.
+
+It is a feature of the perf_events API that events can be created on other
+processes, subject to standard permission checks. This is also true of EBB
+events, however unless the target process enables EBBs (via mtspr(BESCR)) no
+EBBs will ever be delivered.
+
+This makes it possible for a process to enable EBBs for itself, but not
+actually configure any events. At a later time another process can come along
+and attach an EBB event to the process, which will then cause EBBs to be
+delivered to the first process. It's not clear if this is actually useful.
+
+
+When the PMU is configured for EBBs, all PMU interrupts are delivered to the
+user process. This means once an EBB event is scheduled on the PMU, no non-EBB
+events can be configured. This means that EBB events can not be run
+concurrently with regular 'perf' commands, or any other perf events.
+
+It is however safe to run 'perf' commands on a process which is using EBBs. The
+kernel will in general schedule the EBB event, and perf will be notified that
+its events could not run.
+
+The exclusion between EBB events and regular events is implemented using the
+existing "pinned" and "exclusive" attributes of perf_events. This means EBB
+events will be given priority over other events, unless they are also pinned.
+If an EBB event and a regular event are both pinned, then whichever is enabled
+first will be scheduled and the other will be put in error state. See the
+section below titled "Enabling an EBB event" for more information.
+
+
+Creating an EBB event
+---------------------
+
+To request that an event is counted using EBB, the event code should have bit
+63 set.
+
+EBB events must be created with a particular, and restrictive, set of
+attributes - this is so that they interoperate correctly with the rest of the
+perf_events subsystem.
+
+An EBB event must be created with the "pinned" and "exclusive" attributes set.
+Note that if you are creating a group of EBB events, only the leader can have
+these attributes set.
+
+An EBB event must NOT set any of the "inherit", "sample_period", "freq" or
+"enable_on_exec" attributes.
+
+An EBB event must be attached to a task. This is specified to perf_event_open()
+by passing a pid value, typically 0 indicating the current task.
+
+All events in a group must agree on whether they want EBB. That is all events
+must request EBB, or none may request EBB.
+
+EBB events must specify the PMC they are to be counted on. This ensures
+userspace is able to reliably determine which PMC the event is scheduled on.
+
+
+Enabling an EBB event
+---------------------
+
+Once an EBB event has been successfully opened, it must be enabled with the
+perf_events API. This can be achieved either via the ioctl() interface, or the
+prctl() interface.
+
+However, due to the design of the perf_events API, enabling an event does not
+guarantee that it has been scheduled on the PMU. To ensure that the EBB event
+has been scheduled on the PMU, you must perform a read() on the event. If the
+read() returns EOF, then the event has not been scheduled and EBBs are not
+enabled.
+
+This behaviour occurs because the EBB event is pinned and exclusive. When the
+EBB event is enabled it will force all other non-pinned events off the PMU. In
+this case the enable will be successful. However if there is already an event
+pinned on the PMU then the enable will not be successful.
+
+
+Reading an EBB event
+--------------------
+
+It is possible to read() from an EBB event. However the results are
+meaningless. Because interrupts are being delivered to the user process the
+kernel is not able to count the event, and so will return a junk value.
+
+
+Closing an EBB event
+--------------------
+
+When an EBB event is finished with, you can close it using close() as for any
+regular event. If this is the last EBB event the PMU will be deconfigured and
+no further PMU EBBs will be delivered.
+
+
+EBB Handler
+-----------
+
+The EBB handler is just regular userspace code, however it must be written in
+the style of an interrupt handler. When the handler is entered all registers
+are live (possibly) and so must be saved somehow before the handler can invoke
+other code.
+
+It's up to the program how to handle this. For C programs a relatively simple
+option is to create an interrupt frame on the stack and save registers there.
+
+Fork
+----
+
+EBB events are not inherited across fork. If the child process wishes to use
+EBBs it should open a new event for itself. Similarly the EBB state in
+BESCR/EBBHR/EBBRR is cleared across fork().
diff --git a/Documentation/powerpc/ptrace.txt b/Documentation/powerpc/ptrace.txt
index f4a5499b7bc..99c5ce88d0f 100644
--- a/Documentation/powerpc/ptrace.txt
+++ b/Documentation/powerpc/ptrace.txt
@@ -40,6 +40,7 @@ features will have bits indicating whether there is support for:
#define PPC_DEBUG_FEATURE_INSN_BP_MASK 0x2
#define PPC_DEBUG_FEATURE_DATA_BP_RANGE 0x4
#define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x8
+#define PPC_DEBUG_FEATURE_DATA_BP_DAWR 0x10
2. PTRACE_SETHWDEBUG
@@ -127,6 +128,22 @@ Some examples of using the structure to:
p.addr2 = (uint64_t) end_range;
p.condition_value = 0;
+- set a watchpoint in server processors (BookS)
+
+ p.version = 1;
+ p.trigger_type = PPC_BREAKPOINT_TRIGGER_RW;
+ p.addr_mode = PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE;
+ or
+ p.addr_mode = PPC_BREAKPOINT_MODE_EXACT;
+
+ p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE;
+ p.addr = (uint64_t) begin_range;
+ /* For PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE addr2 needs to be specified, where
+ * addr2 - addr <= 8 Bytes.
+ */
+ p.addr2 = (uint64_t) end_range;
+ p.condition_value = 0;
+
3. PTRACE_DELHWDEBUG
Takes an integer which identifies an existing breakpoint or watchpoint
diff --git a/Documentation/powerpc/sound.txt b/Documentation/powerpc/sound.txt
deleted file mode 100644
index df23d95e03a..00000000000
--- a/Documentation/powerpc/sound.txt
+++ /dev/null
@@ -1,81 +0,0 @@
- Information about PowerPC Sound support
-=====================================================================
-
-Please mail me (Cort Dougan, cort@fsmlabs.com) if you have questions,
-comments or corrections.
-
-Last Change: 6.16.99
-
-This just covers sound on the PReP and CHRP systems for now and later
-will contain information on the PowerMac's.
-
-Sound on PReP has been tested and is working with the PowerStack and IBM
-Power Series onboard sound systems which are based on the cs4231(2) chip.
-The sound options when doing the make config are a bit different from
-the default, though.
-
-The I/O base, irq and dma lines that you enter during the make config
-are ignored and are set when booting according to the machine type.
-This is so that one binary can be used for Motorola and IBM machines
-which use different values and isn't allowed by the driver, so things
-are hacked together in such a way as to allow this information to be
-set automatically on boot.
-
-1. Motorola PowerStack PReP machines
-
- Enable support for "Crystal CS4232 based (PnP) cards" and for the
- Microsoft Sound System. The MSS isn't used, but some of the routines
- that the CS4232 driver uses are in it.
-
- Although the options you set are ignored and determined automatically
- on boot these are included for information only:
-
- (830) CS4232 audio I/O base 530, 604, E80 or F40
- (10) CS4232 audio IRQ 5, 7, 9, 11, 12 or 15
- (6) CS4232 audio DMA 0, 1 or 3
- (7) CS4232 second (duplex) DMA 0, 1 or 3
-
- This will allow simultaneous record and playback, as 2 different dma
- channels are used.
-
- The sound will be all left channel and very low volume since the
- auxiliary input isn't muted by default. I had the changes necessary
- for this in the kernel but the sound driver maintainer didn't want
- to include them since it wasn't common in other machines. To fix this
- you need to mute it using a mixer utility of some sort (if you find one
- please let me know) or by patching the driver yourself and recompiling.
-
- There is a problem on the PowerStack 2's (PowerStack Pro's) using a
- different irq/drq than the kernel expects. Unfortunately, I don't know
- which irq/drq it is so if anyone knows please email me.
-
- Midi is not supported since the cs4232 driver doesn't support midi yet.
-
-2. IBM PowerPersonal PReP machines
-
- I've only tested sound on the Power Personal Series of IBM workstations
- so if you try it on others please let me know the result. I'm especially
- interested in the 43p's sound system, which I know nothing about.
-
- Enable support for "Crystal CS4232 based (PnP) cards" and for the
- Microsoft Sound System. The MSS isn't used, but some of the routines
- that the CS4232 driver uses are in it.
-
- Although the options you set are ignored and determined automatically
- on boot these are included for information only:
-
- (530) CS4232 audio I/O base 530, 604, E80 or F40
- (5) CS4232 audio IRQ 5, 7, 9, 11, 12 or 15
- (1) CS4232 audio DMA 0, 1 or 3
- (7) CS4232 second (duplex) DMA 0, 1 or 3
- (330) CS4232 MIDI I/O base 330, 370, 3B0 or 3F0
- (9) CS4232 MIDI IRQ 5, 7, 9, 11, 12 or 15
-
- This setup does _NOT_ allow for recording yet.
-
- Midi is not supported since the cs4232 driver doesn't support midi yet.
-
-2. IBM CHRP
-
- I have only tested this on the 43P-150. Build the kernel with the cs4232
- set as a module and load the module with irq=9 dma=1 dma2=2 io=0x550
diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt
new file mode 100644
index 00000000000..9791e98ab49
--- /dev/null
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -0,0 +1,198 @@
+Transactional Memory support
+============================
+
+POWER kernel support for this feature is currently limited to supporting
+its use by user programs. It is not currently used by the kernel itself.
+
+This file aims to sum up how it is supported by Linux and what behaviour you
+can expect from your user programs.
+
+
+Basic overview
+==============
+
+Hardware Transactional Memory is supported on POWER8 processors, and is a
+feature that enables a different form of atomic memory access. Several new
+instructions are presented to delimit transactions; transactions are
+guaranteed to either complete atomically or roll back and undo any partial
+changes.
+
+A simple transaction looks like this:
+
+begin_move_money:
+ tbegin
+ beq abort_handler
+
+ ld r4, SAVINGS_ACCT(r3)
+ ld r5, CURRENT_ACCT(r3)
+ subi r5, r5, 1
+ addi r4, r4, 1
+ std r4, SAVINGS_ACCT(r3)
+ std r5, CURRENT_ACCT(r3)
+
+ tend
+
+ b continue
+
+abort_handler:
+ ... test for odd failures ...
+
+ /* Retry the transaction if it failed because it conflicted with
+ * someone else: */
+ b begin_move_money
+
+
+The 'tbegin' instruction denotes the start point, and 'tend' the end point.
+Between these points the processor is in 'Transactional' state; any memory
+references will complete in one go if there are no conflicts with other
+transactional or non-transactional accesses within the system. In this
+example, the transaction completes as though it were normal straight-line code
+IF no other processor has touched SAVINGS_ACCT(r3) or CURRENT_ACCT(r3); an
+atomic move of money from the current account to the savings account has been
+performed. Even though the normal ld/std instructions are used (note no
+lwarx/stwcx), either *both* SAVINGS_ACCT(r3) and CURRENT_ACCT(r3) will be
+updated, or neither will be updated.
+
+If, in the meantime, there is a conflict with the locations accessed by the
+transaction, the transaction will be aborted by the CPU. Register and memory
+state will roll back to that at the 'tbegin', and control will continue from
+'tbegin+4'. The branch to abort_handler will be taken this second time; the
+abort handler can check the cause of the failure, and retry.
+
+Checkpointed registers include all GPRs, FPRs, VRs/VSRs, LR, CCR/CR, CTR, FPCSR
+and a few other status/flag regs; see the ISA for details.
+
+Causes of transaction aborts
+============================
+
+- Conflicts with cache lines used by other processors
+- Signals
+- Context switches
+- See the ISA for full documentation of everything that will abort transactions.
+
+
+Syscalls
+========
+
+Performing syscalls from within transaction is not recommended, and can lead
+to unpredictable results.
+
+Syscalls do not by design abort transactions, but beware: The kernel code will
+not be running in transactional state. The effect of syscalls will always
+remain visible, but depending on the call they may abort your transaction as a
+side-effect, read soon-to-be-aborted transactional data that should not remain
+invisible, etc. If you constantly retry a transaction that constantly aborts
+itself by calling a syscall, you'll have a livelock & make no progress.
+
+Simple syscalls (e.g. sigprocmask()) "could" be OK. Even things like write()
+from, say, printf() should be OK as long as the kernel does not access any
+memory that was accessed transactionally.
+
+Consider any syscalls that happen to work as debug-only -- not recommended for
+production use. Best to queue them up till after the transaction is over.
+
+
+Signals
+=======
+
+Delivery of signals (both sync and async) during transactions provides a second
+thread state (ucontext/mcontext) to represent the second transactional register
+state. Signal delivery 'treclaim's to capture both register states, so signals
+abort transactions. The usual ucontext_t passed to the signal handler
+represents the checkpointed/original register state; the signal appears to have
+arisen at 'tbegin+4'.
+
+If the sighandler ucontext has uc_link set, a second ucontext has been
+delivered. For future compatibility the MSR.TS field should be checked to
+determine the transactional state -- if so, the second ucontext in uc->uc_link
+represents the active transactional registers at the point of the signal.
+
+For 64-bit processes, uc->uc_mcontext.regs->msr is a full 64-bit MSR and its TS
+field shows the transactional mode.
+
+For 32-bit processes, the mcontext's MSR register is only 32 bits; the top 32
+bits are stored in the MSR of the second ucontext, i.e. in
+uc->uc_link->uc_mcontext.regs->msr. The top word contains the transactional
+state TS.
+
+However, basic signal handlers don't need to be aware of transactions
+and simply returning from the handler will deal with things correctly:
+
+Transaction-aware signal handlers can read the transactional register state
+from the second ucontext. This will be necessary for crash handlers to
+determine, for example, the address of the instruction causing the SIGSEGV.
+
+Example signal handler:
+
+ void crash_handler(int sig, siginfo_t *si, void *uc)
+ {
+ ucontext_t *ucp = uc;
+ ucontext_t *transactional_ucp = ucp->uc_link;
+
+ if (ucp_link) {
+ u64 msr = ucp->uc_mcontext.regs->msr;
+ /* May have transactional ucontext! */
+#ifndef __powerpc64__
+ msr |= ((u64)transactional_ucp->uc_mcontext.regs->msr) << 32;
+#endif
+ if (MSR_TM_ACTIVE(msr)) {
+ /* Yes, we crashed during a transaction. Oops. */
+ fprintf(stderr, "Transaction to be restarted at 0x%llx, but "
+ "crashy instruction was at 0x%llx\n",
+ ucp->uc_mcontext.regs->nip,
+ transactional_ucp->uc_mcontext.regs->nip);
+ }
+ }
+
+ fix_the_problem(ucp->dar);
+ }
+
+When in an active transaction that takes a signal, we need to be careful with
+the stack. It's possible that the stack has moved back up after the tbegin.
+The obvious case here is when the tbegin is called inside a function that
+returns before a tend. In this case, the stack is part of the checkpointed
+transactional memory state. If we write over this non transactionally or in
+suspend, we are in trouble because if we get a tm abort, the program counter and
+stack pointer will be back at the tbegin but our in memory stack won't be valid
+anymore.
+
+To avoid this, when taking a signal in an active transaction, we need to use
+the stack pointer from the checkpointed state, rather than the speculated
+state. This ensures that the signal context (written tm suspended) will be
+written below the stack required for the rollback. The transaction is aborted
+because of the treclaim, so any memory written between the tbegin and the
+signal will be rolled back anyway.
+
+For signals taken in non-TM or suspended mode, we use the
+normal/non-checkpointed stack pointer.
+
+
+Failure cause codes used by kernel
+==================================
+
+These are defined in <asm/reg.h>, and distinguish different reasons why the
+kernel aborted a transaction:
+
+ TM_CAUSE_RESCHED Thread was rescheduled.
+ TM_CAUSE_TLBI Software TLB invalide.
+ TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap.
+ TM_CAUSE_SYSCALL Currently unused; future syscalls that must abort
+ transactions for consistency will use this.
+ TM_CAUSE_SIGNAL Signal delivered.
+ TM_CAUSE_MISC Currently unused.
+ TM_CAUSE_ALIGNMENT Alignment fault.
+ TM_CAUSE_EMULATE Emulation that touched memory.
+
+These can be checked by the user program's abort handler as TEXASR[0:7]. If
+bit 7 is set, it indicates that the error is consider persistent. For example
+a TM_CAUSE_ALIGNMENT will be persistent while a TM_CAUSE_RESCHED will not.q
+
+GDB
+===
+
+GDB and ptrace are not currently TM-aware. If one stops during a transaction,
+it looks like the transaction has just started (the checkpointed state is
+presented). The transaction cannot then be continued and will take the failure
+handler route. Furthermore, the transactional 2nd register state will be
+inaccessible. GDB can currently be used on programs using TM, but not sensibly
+in parts within transactions.
diff --git a/Documentation/powerpc/zImage_layout.txt b/Documentation/powerpc/zImage_layout.txt
deleted file mode 100644
index 048e0150f57..00000000000
--- a/Documentation/powerpc/zImage_layout.txt
+++ /dev/null
@@ -1,47 +0,0 @@
- Information about the Linux/PPC kernel images
-=====================================================================
-
-Please mail me (Cort Dougan, cort@fsmlabs.com) if you have questions,
-comments or corrections.
-
-This document is meant to answer several questions I've had about how
-the PReP system boots and how Linux/PPC interacts with that mechanism.
-It would be nice if we could have information on how other architectures
-boot here as well. If you have anything to contribute, please
-let me know.
-
-
-1. PReP boot file
-
- This is the file necessary to boot PReP systems from floppy or
- hard drive. The firmware reads the PReP partition table entry
- and will load the image accordingly.
-
- To boot the zImage, copy it onto a floppy with dd if=zImage of=/dev/fd0h1440
- or onto a PReP hard drive partition with dd if=zImage of=/dev/sda4
- assuming you've created a PReP partition (type 0x41) with fdisk on
- /dev/sda4.
-
- The layout of the image format is:
-
- 0x0 +------------+
- | | PReP partition table entry
- | |
- 0x400 +------------+
- | | Bootstrap program code + data
- | |
- | |
- +------------+
- | | compressed kernel, elf header removed
- +------------+
- | | initrd (if loaded)
- +------------+
- | | Elf section table for bootstrap program
- +------------+
-
-
-2. MBX boot file
-
- The MBX boards can load an elf image, and relocate it to the
- proper location in memory - it copies the image to the location it was
- linked at.