diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2010-05-21 18:58:52 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2010-05-21 18:58:52 -0700 |
commit | 6109e2ce2600e2db26cd0424bb9c6ed019723288 (patch) | |
tree | 54b5d347bf12e0a987edfb52f287399f748a9a38 | |
parent | 0961d6581c870850342ad6ea25263763433d666f (diff) | |
parent | ac81860ea073daed50246af54db706c6e491f240 (diff) |
Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (36 commits)
PCI: hotplug: pciehp: Removed check for hotplug of display devices
PCI: read memory ranges out of Broadcom CNB20LE host bridge
PCI: Allow manual resource allocation for PCI hotplug bridges
x86/PCI: make ACPI MCFG reserved error messages ACPI specific
PCI hotplug: Use kmemdup
PM/PCI: Update PCI power management documentation
PCI: output FW warning in pci_read/write_vpd
PCI: fix typos pci_device_dis/enable to pci_dis/enable_device in comments
PCI quirks: disable msi on AMD rs4xx internal gfx bridges
PCI: Disable MSI for MCP55 on P5N32-E SLI
x86/PCI: irq and pci_ids patch for additional Intel Cougar Point DeviceIDs
PCI: aerdrv: trivial cleanup for aerdrv_core.c
PCI: aerdrv: trivial cleanup for aerdrv.c
PCI: aerdrv: introduce default_downstream_reset_link
PCI: aerdrv: rework find_aer_service
PCI: aerdrv: remove is_downstream
PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS
PCI: aerdrv: redefine PCI_ERR_ROOT_*_SRC
PCI: aerdrv: rework do_recovery
PCI: aerdrv: rework get_e_source()
...
33 files changed, 1726 insertions, 740 deletions
diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 25be3250f7d..428676cfa61 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -133,6 +133,46 @@ Description: The symbolic link points to the PCI device sysfs entry of the Physical Function this device associates with. + +What: /sys/bus/pci/slots/... +Date: April 2005 (possibly older) +KernelVersion: 2.6.12 (possibly older) +Contact: linux-pci@vger.kernel.org +Description: + When the appropriate driver is loaded, it will create a + directory per claimed physical PCI slot in + /sys/bus/pci/slots/. The names of these directories are + specific to the driver, which in turn, are specific to the + platform, but in general, should match the label on the + machine's physical chassis. + + The drivers that can create slot directories include the + PCI hotplug drivers, and as of 2.6.27, the pci_slot driver. + + The slot directories contain, at a minimum, a file named + 'address' which contains the PCI bus:device:function tuple. + Other files may appear as well, but are specific to the + driver. + +What: /sys/bus/pci/slots/.../function[0-7] +Date: March 2010 +KernelVersion: 2.6.35 +Contact: linux-pci@vger.kernel.org +Description: + If PCI slot directories (as described above) are created, + and the physical slot is actually populated with a device, + symbolic links in the slot directory pointing to the + device's PCI functions are created as well. + +What: /sys/bus/pci/devices/.../slot +Date: March 2010 +KernelVersion: 2.6.35 +Contact: linux-pci@vger.kernel.org +Description: + If PCI slot directories (as described above) are created, + a symbolic link pointing to the slot directory will be + created as well. + What: /sys/bus/pci/slots/.../module Date: June 2009 Contact: linux-pci@vger.kernel.org diff --git a/Documentation/PCI/pcieaer-howto.txt b/Documentation/PCI/pcieaer-howto.txt index be21001ab14..26d3d945c3c 100644 --- a/Documentation/PCI/pcieaer-howto.txt +++ b/Documentation/PCI/pcieaer-howto.txt @@ -13,7 +13,7 @@ Reporting (AER) driver and provides information on how to use it, as well as how to enable the drivers of endpoint devices to conform with PCI Express AER driver. -1.2 Copyright © Intel Corporation 2006. +1.2 Copyright (C) Intel Corporation 2006. 1.3 What is the PCI Express AER Driver? @@ -71,15 +71,11 @@ console. If it's a correctable error, it is outputed as a warning. Otherwise, it is printed as an error. So users could choose different log level to filter out correctable error messages. -Below shows an example. -+------ PCI-Express Device Error -----+ -Error Severity : Uncorrected (Fatal) -PCIE Bus Error type : Transaction Layer -Unsupported Request : First -Requester ID : 0500 -VendorID=8086h, DeviceID=0329h, Bus=05h, Device=00h, Function=00h -TLB Header: -04000001 00200a03 05010000 00050100 +Below shows an example: +0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID) +0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000 +0000:50:00.0: [20] Unsupported Request (First) +0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100 In the example, 'Requester ID' means the ID of the device who sends the error message to root port. Pls. refer to pci express specs for @@ -112,7 +108,7 @@ but the PCI Express link itself is fully functional. Fatal errors, on the other hand, cause the link to be unreliable. When AER is enabled, a PCI Express device will automatically send an -error message to the PCIE root port above it when the device captures +error message to the PCIe root port above it when the device captures an error. The Root Port, upon receiving an error reporting message, internally processes and logs the error message in its PCI Express capability structure. Error information being logged includes storing @@ -198,8 +194,9 @@ to reset link, AER port service driver is required to provide the function to reset link. Firstly, kernel looks for if the upstream component has an aer driver. If it has, kernel uses the reset_link callback of the aer driver. If the upstream component has no aer driver -and the port is downstream port, we will use the aer driver of the -root port who reports the AER error. As for upstream ports, +and the port is downstream port, we will perform a hot reset as the +default by setting the Secondary Bus Reset bit of the Bridge Control +register associated with the downstream port. As for upstream ports, they should provide their own aer service drivers with reset_link function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes @@ -253,11 +250,11 @@ cleanup uncorrectable status register. Pls. refer to section 3.3. 4. Software error injection -Debugging PCIE AER error recovery code is quite difficult because it +Debugging PCIe AER error recovery code is quite difficult because it is hard to trigger real hardware errors. Software based error -injection can be used to fake various kinds of PCIE errors. +injection can be used to fake various kinds of PCIe errors. -First you should enable PCIE AER software error injection in kernel +First you should enable PCIe AER software error injection in kernel configuration, that is, following item should be in your .config. CONFIG_PCIEAER_INJECT=y or CONFIG_PCIEAER_INJECT=m diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt index dd8fe43888d..62328d76b55 100644 --- a/Documentation/power/pci.txt +++ b/Documentation/power/pci.txt @@ -1,299 +1,1025 @@ - PCI Power Management -~~~~~~~~~~~~~~~~~~~~ -An overview of the concepts and the related functions in the Linux kernel +Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. + +An overview of concepts and the Linux kernel's interfaces related to PCI power +management. Based on previous work by Patrick Mochel <mochel@transmeta.com> +(and others). -Patrick Mochel <mochel@transmeta.com> -(and others) +This document only covers the aspects of power management specific to PCI +devices. For general description of the kernel's interfaces related to device +power management refer to Documentation/power/devices.txt and +Documentation/power/runtime_pm.txt. --------------------------------------------------------------------------- -1. Overview -2. How the PCI Subsystem Does Power Management -3. PCI Utility Functions -4. PCI Device Drivers -5. Resources - -1. Overview -~~~~~~~~~~~ - -The PCI Power Management Specification was introduced between the PCI 2.1 and -PCI 2.2 Specifications. It a standard interface for controlling various -power management operations. - -Implementation of the PCI PM Spec is optional, as are several sub-components of -it. If a device supports the PCI PM Spec, the device will have an 8 byte -capability field in its PCI configuration space. This field is used to describe -and control the standard PCI power management features. - -The PCI PM spec defines 4 operating states for devices (D0 - D3) and for buses -(B0 - B3). The higher the number, the less power the device consumes. However, -the higher the number, the longer the latency is for the device to return to -an operational state (D0). - -There are actually two D3 states. When someone talks about D3, they usually -mean D3hot, which corresponds to an ACPI D2 state (power is reduced, the -device may lose some context). But they may also mean D3cold, which is an -ACPI D3 state (power is fully off, all state was discarded); or both. - -Bus power management is not covered in this version of this document. - -Note that all PCI devices support D0 and D3cold by default, regardless of -whether or not they implement any of the PCI PM spec. - -The possible state transitions that a device can undergo are: - -+---------------------------+ -| Current State | New State | -+---------------------------+ -| D0 | D1, D2, D3| -+---------------------------+ -| D1 | D2, D3 | -+---------------------------+ -| D2 | D3 | -+---------------------------+ -| D1, D2, D3 | D0 | -+---------------------------+ - -Note that when the system is entering a global suspend state, all devices will -be placed into D3 and when resuming, all devices will be placed into D0. -However, when the system is running, other state transitions are possible. - -2. How The PCI Subsystem Handles Power Management -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The PCI suspend/resume functionality is accessed indirectly via the Power -Management subsystem. At boot, the PCI driver registers a power management -callback with that layer. Upon entering a suspend state, the PM layer iterates -through all of its registered callbacks. This currently takes place only during -APM state transitions. - -Upon going to sleep, the PCI subsystem walks its device tree twice. Both times, -it does a depth first walk of the device tree. The first walk saves each of the -device's state and checks for devices that will prevent the system from entering -a global power state. The next walk then places the devices in a low power +1. Hardware and Platform Support for PCI Power Management +2. PCI Subsystem and Device Power Management +3. PCI Device Drivers and Power Management +4. Resources + + +1. Hardware and Platform Support for PCI Power Management +========================================================= + +1.1. Native and Platform-Based Power Management +----------------------------------------------- +In general, power management is a feature allowing one to save energy by putting +devices into states in which they draw less power (low-power states) at the +price of reduced functionality or performance. + +Usually, a device is put into a low-power state when it is underutilized or +completely inactive. However, when it is necessary to use the device once +again, it has to be put back into the "fully functional" state (full-power +state). This may happen when there are some data for the device to handle or +as a result of an external event requiring the device to be active, which may +be signaled by the device itself. + +PCI devices may be put into low-power states in two ways, by using the device +capabilities introduced by the PCI Bus Power Management Interface Specification, +or with the help of platform firmware, such as an ACPI BIOS. In the first +approach, that is referred to as the native PCI power management (native PCI PM) +in what follows, the device power state is changed as a result of writing a +specific value into one of its standard configuration registers. The second +approach requires the platform firmware to provide special methods that may be +used by the kernel to change the device's power state. + +Devices supporting the native PCI PM usually can generate wakeup signals called +Power Management Events (PMEs) to let the kernel know about external events +requiring the device to be active. After receiving a PME the kernel is supposed +to put the device that sent it into the full-power state. However, the PCI Bus +Power Management Interface Specification doesn't define any standard method of +delivering the PME from the device to the CPU and the operating system kernel. +It is assumed that the platform firmware will perform this task and therefore, +even though a PCI device is set up to generate PMEs, it also may be necessary to +prepare the platform firmware for notifying the CPU of the PMEs coming from the +device (e.g. by generating interrupts). + +In turn, if the methods provided by the platform firmware are used for changing +the power state of a device, usually the platform also provides a method for +preparing the device to generate wakeup signals. In that case, however, it +often also is necessary to prepare the device for generating PMEs using the +native PCI PM mechanism, because the method provided by the platform depends on +that. + +Thus in many situations both the native and the platform-based power management +mechanisms have to be used simultaneously to obtain the desired result. + +1.2. Native PCI Power Management +-------------------------------- +The PCI Bus Power Management Interface Specification (PCI PM Spec) was +introduced between the PCI 2.1 and PCI 2.2 Specifications. It defined a +standard interface for performing various operations related to power +management. + +The implementation of the PCI PM Spec is optional for conventional PCI devices, +but it is mandatory for PCI Express devices. If a device supports the PCI PM +Spec, it has an 8 byte power management capability field in its PCI +configuration space. This field is used to describe and control the standard +features related to the native PCI power management. + +The PCI PM Spec defines 4 operating states for devices (D0-D3) and for buses +(B0-B3). The higher the number, the less power is drawn by the device or bus +in that state. However, the higher the number, the longer the latency for +the device or bus to return to the full-power state (D0 or B0, respectively). + +There are two variants of the D3 state defined by the specification. The first +one is D3hot, referred to as the software accessible D3, because devices can be +programmed to go into it. The second one, D3cold, is the state that PCI devices +are in when the supply voltage (Vcc) is removed from them. It is not possible +to program a PCI device to go into D3cold, although there may be a programmable +interface for putting the bus the device is on into a state in which Vcc is +removed from all devices on the bus. + +PCI bus power management, however, is not supported by the Linux kernel at the +time of this writing and therefore it is not covered by this document. + +Note that every PCI device can be in the full-power state (D0) or in D3cold, +regardless of whether or not it implements the PCI PM Spec. In addition to +that, if the PCI PM Spec is implemented by the device, it must support D3hot +as well as D0. The support for the D1 and D2 power states is optional. + +PCI devices supporting the PCI PM Spec can be programmed to go to any of the +supported low-power states (except for D3cold). While in D1-D3hot the +standard configuration registers of the device must be accessible to software +(i.e. the device is required to respond to PCI configuration accesses), although +its I/O and memory spaces are then disabled. This allows the device to be +programmatically put into D0. Thus the kernel can switch the device back and +forth between D0 and the supported low-power states (except for D3cold) and the +possible power state transitions the device can undergo are the following: + ++----------------------------+ +| Current State | New State | ++----------------------------+ +| D0 | D1, D2, D3 | ++----------------------------+ +| D1 | D2, D3 | ++----------------------------+ +| D2 | D3 | ++----------------------------+ +| D1, D2, D3 | D0 | ++----------------------------+ + +The transition from D3cold to D0 occurs when the supply voltage is provided to +the device (i.e. power is restored). In that case the device returns to D0 with +a full power-on reset sequence and the power-on defaults are restored to the +device by hardware just as at initial power up. + +PCI devices supporting the PCI PM Spec can be programmed to generate PMEs +while in a low-power state (D1-D3), but they are not required to be capable +of generating PMEs from all supported low-power states. In particular, the +capability of generating PMEs from D3cold is optional and depends on the +presence of additional voltage (3.3Vaux) allowing the device to remain +sufficiently active to generate a wakeup signal. + +1.3. ACPI Device Power Management +--------------------------------- +The platform firmware support for the power management of PCI devices is +system-specific. However, if the system in question is compliant with the +Advanced Configuration and Power Interface (ACPI) Specification, like the +majority of x86-based systems, it is supposed to implement device power +management interfaces defined by the ACPI standard. + +For this purpose the ACPI BIOS provides special functions called "control +methods" that may be executed by the kernel to perform specific tasks, such as +putting a device into a low-power state. These control methods are encoded +using special byte-code language called the ACPI Machine Language (AML) and +stored in the machine's BIOS. The kernel loads them from the BIOS and executes +them as needed using an AML interpreter that translates the AML byte code into +computations and memory or I/O space accesses. This way, in theory, a BIOS +writer can provide the kernel with a means to perform actions depending +on the system design in a system-specific fashion. + +ACPI control methods may be divided into global control methods, that are not +associated with any particular devices, and device control methods, that have +to be defined separately for each device supposed to be handled with the help of +the platform. This means, in particular, that ACPI device control methods can +only be used to handle devices that the BIOS writer knew about in advance. The +ACPI methods used for device power management fall into that category. + +The ACPI specification assumes that devices can be in one of four power states +labeled as D0, D1, D2, and D3 that roughly correspond to the native PCI PM +D0-D3 states (although the difference between D3hot and D3cold is not taken +into account by ACPI). Moreover, for each power state of a device there is a +set of power resources that have to be enabled for the device to be put into +that state. These power resources are controlled (i.e. enabled or disabled) +with the help of their own control methods, _ON and _OFF, that have to be +defined individually for each of them. + +To put a device into the ACPI power state Dx (where x is a number between 0 and +3 inclusive) the kernel is supposed to (1) enable the power resources required +by the device in this state using their _ON control methods and (2) execute the +_PSx control method defined for the device. In addition to that, if the device +is going to be put into a low-power state (D1-D3) and is supposed to generate +wakeup signals from that state, the _DSW (or _PSW, replaced with _DSW by ACPI +3.0) control method defined for it has to be executed before _PSx. Power +resources that are not required by the device in the target power state and are +not required any more by any other device should be disabled (by executing their +_OFF control methods). If the current power state of the device is D3, it can +only be put into D0 this way. + +However, quite often the power states of devices are changed during a +system-wide transition into a sleep state or back into the working state. ACPI +defines four system sleep states, S1, S2, S3, and S4, and denotes the system +working state as S0. In general, the target system sleep (or working) state +determines the highest power (lowest number) state the device can be put +into and the kernel is supposed to obtain this information by executing the +device's _SxD control method (where x is a number between 0 and 4 inclusive). +If the device is required to wake up the system from the target sleep state, the +lowest power (highest number) state it can be put into is also determined by the +target state of the system. The kernel is then supposed to use the device's +_SxW control method to obtain the number of that state. It also is supposed to +use the device's _PRW control method to learn which power resources need to be +enabled for the device to be able to generate wakeup signals. + +1.4. Wakeup Signaling +--------------------- +Wakeup signals generated by PCI devices, either as native PCI PMEs, or as +a result of the execution of the _DSW (or _PSW) ACPI control method before +putting the device into a low-power state, have to be caught and handled as +appropriate. If they are sent while the system is in the working state +(ACPI S0), they should be translated into interrupts so that the kernel can +put the devices generating them into the full-power state and take care of the +events that triggered them. In turn, if they are sent while the system is +sleeping, they should cause the system's core logic to trigger wakeup. + +On ACPI-based systems wakeup signals sent by conventional PCI devices are +converted into ACPI General-Purpose Events (GPEs) which are hardware signals +from the system core logic generated in response to various events that need to +be acted upon. Every GPE is associated with one or more sources of potentially +interesting events. In particular, a GPE may be associated with a PCI device +capable of signaling wakeup. The information on the connections between GPEs +and event sources is recorded in the system's ACPI BIOS from where it can be +read by the kernel. + +If a PCI device known to the system's ACPI BIOS signals wakeup, the GPE +associated with it (if there is one) is triggered. The GPEs associated with PCI +bridges may also be triggered in response to a wakeup signal from one of the +devices below the bridge (this also is the case for root bridges) and, for +example, native PCI PMEs from devices unknown to the system's ACPI BIOS may be +handled this way. + +A GPE may be triggered when the system is sleeping (i.e. when it is in one of +the ACPI S1-S4 states), in which case system wakeup is started by its core logic +(the device that was the source of the signal causing the system wakeup to occur +may be identified later). The GPEs used in such situations are referred to as +wakeup GPEs. + +Usually, however, GPEs are also triggered when the system is in the working +state (ACPI S0) and in that case the system's core logic generates a System +Control Interrupt (SCI) to notify the kernel of the event. Then, the SCI +handler identifies the GPE that caused the interrupt to be generated which, +in turn, allows the kernel to identify the source of the event (that may be +a PCI device signaling wakeup). The GPEs used for notifying the kernel of +events occurring while the system is in the working state are referred to as +runtime GPEs. + +Unfortunately, there is no standard way of handling wakeup signals sent by +conventional PCI devices on systems that are not ACPI-based, but there is one +for PCI Express devices. Namely, the PCI Express Base Specification introduced +a native mechanism for converting native PCI PMEs into interrupts generated by +root ports. For conventional PCI devices native PMEs are out-of-band, so they +are routed separately and they need not pass through bridges (in principle they +may be routed directly to the system's core logic), but for PCI Express devices +they are in-band messages that have to pass through the PCI Express hierarchy, +including the root port on the path from the device to the Root Complex. Thus +it was possible to introduce a mechanism by which a root port generates an +interrupt whenever it receives a PME message from one of the devices below it. +The PCI Express Requester ID of the device that sent the PME message is then +recorded in one of the root port's configuration registers from where it may be +read by the interrupt handler allowing the device to be identified. [PME +messages sent by PCI Express endpoints integrated with the Root Complex don't +pass through root ports, but instead they cause a Root Complex Event Collector +(if there is one) to generate interrupts.] + +In principle the native PCI Express PME signaling may also be used on ACPI-based +systems along with the GPEs, but to use it the kernel has to ask the system's +ACPI BIOS to release control of root port configuration registers. The ACPI +BIOS, however, is not required to allow the kernel to control these registers +and if it doesn't do that, the kernel must not modify their contents. Of course +the native PCI Express PME signaling cannot be used by the kernel in that case. + + +2. PCI Subsystem and Device Power Management +============================================ + +2.1. Device Power Management Callbacks +-------------------------------------- +The PCI Subsystem participates in the power management of PCI devices in a +number of ways. First of all, it provides an intermediate code layer between +the device power management core (PM core) and PCI device drivers. +Specifically, the pm field of the PCI subsystem's struct bus_type object, +pci_bus_type, points to a struct dev_pm_ops object, pci_dev_pm_ops, containing +pointers to several device power management callbacks: + +const struct dev_pm_ops pci_dev_pm_ops = { + .prepare = pci_pm_prepare, + .complete = pci_pm_complete, + .suspend = pci_pm_suspend, + .resume = pci_pm_resume, + .freeze = pci_pm_freeze, + .thaw = pci_pm_thaw, + .poweroff = pci_pm_poweroff, + .restore = pci_pm_restore, + .suspend_noirq = pci_pm_suspend_noirq, + .resume_noirq = pci_pm_resume_noirq, + .freeze_noirq = pci_pm_freeze_noirq, + .thaw_noirq = pci_pm_thaw_noirq, + .poweroff_noirq = pci_pm_poweroff_noirq, + .restore_noirq = pci_pm_restore_noirq, + .runtime_suspend = pci_pm_runtime_suspend, + .runtime_resume = pci_pm_runtime_resume, + .runtime_idle = pci_pm_runtime_idle, +}; + +These callbacks are executed by the PM core in various situations related to +device power management and they, in turn, execute power management callbacks +provided by PCI device drivers. They also perform power management operations +involving some standard configuration registers of PCI devices that device +drivers need not know or care about. + +The structure representing a PCI device, struct pci_dev, contains several fields +that these callbacks operate on: + +struct pci_dev { + ... + pci_power_t current_state; /* Current operating state. */ + int pm_cap; /* PM capability offset in the + configuration space */ + unsigned int pme_support:5; /* Bitmask of states from which PME# + can be generated */ + unsigned int pme_interrupt:1;/* Is native PCIe PME signaling used? */ + unsigned int d1_support:1; /* Low power state D1 is supported */ + unsigned int d2_support:1; /* Low power state D2 is supported */ + unsigned int no_d1d2:1; /* D1 and D2 are forbidden */ + unsigned int wakeup_prepared:1; /* Device prepared for wake up */ + unsigned int d3_delay; /* D3->D0 transition time in ms */ + ... +}; + +They also indirectly use some fields of the struct device that is embedded in +struct pci_dev. + +2.2. Device Initialization +-------------------------- +The PCI subsystem's first task related to device power management is to +prepare the device for power management and initialize the fields of struct +pci_dev used for this purpose. This happens in two functions defined in +drivers/pci/pci.c, pci_pm_init() and platform_pci_wakeup_init(). + +The first of these functions checks if the device supports native PCI PM +and if that's the case the offset of its power management capability structure +in the configuration space is stored in the pm_cap field of the device's struct +pci_dev object. Next, the function checks which PCI low-power states are +supported by the device and from which low-power states the device can generate +native PCI PMEs. The power management fields of the device's struct pci_dev and +the struct device embedded in it are updated accordingly and the generation of +PMEs by the device is disabled. + +The second function checks if the device can be prepared to signal wakeup with +the help of the platform firmware, such as the ACPI BIOS. If that is the case, +the function updates the wakeup fields in struct device embedded in the +device's struct pci_dev and uses the firmware-provided method to prevent the +device from signaling wakeup. + +At this point the device is ready for power management. For driverless devices, +however, this functionality is limited to a few basic operations carried out +during system-wide transitions to a sleep state and back to the working state. + +2.3. Runtime Device Power Management +------------------------------------ +The PCI subsystem plays a vital role in the runtime power management of PCI +devices. For this purpose it uses the general runtime power management +(runtime PM) framework described in Documentation/power/runtime_pm.txt. +Namely, it provides subsystem-level callbacks: + + pci_pm_runtime_suspend() + pci_pm_runtime_resume() + pci_pm_runtime_idle() + +that are executed by the core runtime PM routines. It also implements the +entire mechanics necessary for handling runtime wakeup signals from PCI devices +in low-power states, which at the time of this writing works for both the native +PCI Express PME signaling and the ACPI GPE-based wakeup signaling described in +Section 1. + +First, a PCI device is put into a low-power state, or suspended, with the help +of pm_schedule_suspend() or pm_runtime_suspend() which for PCI devices call +pci_pm_runtime_suspend() to do the actual job. For this to work, the device's +driver has to provide a pm->runtime_suspend() callback (see below), which is +run by pci_pm_runtime_suspend() as the first action. If the driver's callback +returns successfully, the device's standard configuration registers are saved, +the device is prepared to generate wakeup signals and, finally, it is put into +the target low-power state. + +The low-power state to put the device into is the lowest-power (highest number) +state from which it can signal wakeup. The exact method of signaling wakeup is +system-dependent and is determined by the PCI subsystem on the basis of the +reported capabilities of the device and the platform firmware. To prepare the +device for signaling wakeup and put it into the selected low-power state, the +PCI subsystem can use the platform firmware as well as the device's native PCI +PM capabilities, if supported. + +It is expected that the device driver's pm->runtime_suspend() callback will +not attempt to prepare the device for signaling wakeup or to put it into a +low-power state. The driver ought to leave these tasks to the PCI subsystem +that has all of the information necessary to perform them. + +A suspended device is brought back into the "active" state, or resumed, +with the help of pm_request_resume() or pm_runtime_resume() which both call +pci_pm_runtime_resume() for PCI devices. Again, this only works if the device's +driver provides a pm->runtime_resume() callback (see below). However, before +the driver's callback is executed, pci_pm_runtime_resume() brings the device +back into the full-power state, prevents it from signaling wakeup while in that +state and restores its standard configuration registers. Thus the driver's +callback need not worry about the PCI-specific aspects of the device resume. + +Note that generally pci_pm_runtime_resume() may be called in two different |