diff options
Diffstat (limited to 'Documentation/cpu-freq')
| -rw-r--r-- | Documentation/cpu-freq/boost.txt | 93 | ||||
| -rw-r--r-- | Documentation/cpu-freq/core.txt | 31 | ||||
| -rw-r--r-- | Documentation/cpu-freq/cpu-drivers.txt | 116 | ||||
| -rw-r--r-- | Documentation/cpu-freq/governors.txt | 70 | ||||
| -rw-r--r-- | Documentation/cpu-freq/index.txt | 4 | ||||
| -rw-r--r-- | Documentation/cpu-freq/intel-pstate.txt | 43 | ||||
| -rw-r--r-- | Documentation/cpu-freq/pcc-cpufreq.txt | 207 | ||||
| -rw-r--r-- | Documentation/cpu-freq/user-guide.txt | 29 |
8 files changed, 536 insertions, 57 deletions
diff --git a/Documentation/cpu-freq/boost.txt b/Documentation/cpu-freq/boost.txt new file mode 100644 index 00000000000..dd62e1334f0 --- /dev/null +++ b/Documentation/cpu-freq/boost.txt @@ -0,0 +1,93 @@ +Processor boosting control + + - information for users - + +Quick guide for the impatient: +-------------------- +/sys/devices/system/cpu/cpufreq/boost +controls the boost setting for the whole system. You can read and write +that file with either "0" (boosting disabled) or "1" (boosting allowed). +Reading or writing 1 does not mean that the system is boosting at this +very moment, but only that the CPU _may_ raise the frequency at it's +discretion. +-------------------- + +Introduction +------------- +Some CPUs support a functionality to raise the operating frequency of +some cores in a multi-core package if certain conditions apply, mostly +if the whole chip is not fully utilized and below it's intended thermal +budget. The decision about boost disable/enable is made either at hardware +(e.g. x86) or software (e.g ARM). +On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core", +in technical documentation "Core performance boost". In Linux we use +the term "boost" for convenience. + +Rationale for disable switch +---------------------------- + +Though the idea is to just give better performance without any user +intervention, sometimes the need arises to disable this functionality. +Most systems offer a switch in the (BIOS) firmware to disable the +functionality at all, but a more fine-grained and dynamic control would +be desirable: +1. While running benchmarks, reproducible results are important. Since + the boosting functionality depends on the load of the whole package, + single thread performance can vary. By explicitly disabling the boost + functionality at least for the benchmark's run-time the system will run + at a fixed frequency and results are reproducible again. +2. To examine the impact of the boosting functionality it is helpful + to do tests with and without boosting. +3. Boosting means overclocking the processor, though under controlled + conditions. By raising the frequency and the voltage the processor + will consume more power than without the boosting, which may be + undesirable for instance for mobile users. Disabling boosting may + save power here, though this depends on the workload. + + +User controlled switch +---------------------- + +To allow the user to toggle the boosting functionality, the cpufreq core +driver exports a sysfs knob to enable or disable it. There is a file: +/sys/devices/system/cpu/cpufreq/boost +which can either read "0" (boosting disabled) or "1" (boosting enabled). +The file is exported only when cpufreq driver supports boosting. +Explicitly changing the permissions and writing to that file anyway will +return EINVAL. + +On supported CPUs one can write either a "0" or a "1" into this file. +This will either disable the boost functionality on all cores in the +whole system (0) or will allow the software or hardware to boost at will +(1). + +Writing a "1" does not explicitly boost the system, but just allows the +CPU to boost at their discretion. Some implementations take external +factors like the chip's temperature into account, so boosting once does +not necessarily mean that it will occur every time even using the exact +same software setup. + + +AMD legacy cpb switch +--------------------- +The AMD powernow-k8 driver used to support a very similar switch to +disable or enable the "Core Performance Boost" feature of some AMD CPUs. +This switch was instantiated in each CPU's cpufreq directory +(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb". +Though the per CPU existence hints at a more fine grained control, the +actual implementation only supported a system-global switch semantics, +which was simply reflected into each CPU's file. Writing a 0 or 1 into it +would pull the other CPUs to the same state. +For compatibility reasons this file and its behavior is still supported +on AMD CPUs, though it is now protected by a config switch +(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created, +even with the config option set. +This functionality is considered legacy and will be removed in some future +kernel version. + +More fine grained boosting control +---------------------------------- + +Technically it is possible to switch the boosting functionality at least +on a per package basis, for some CPUs even per core. Currently the driver +does not support it, but this may be implemented in the future. diff --git a/Documentation/cpu-freq/core.txt b/Documentation/cpu-freq/core.txt index ce0666e5103..70933eadc30 100644 --- a/Documentation/cpu-freq/core.txt +++ b/Documentation/cpu-freq/core.txt @@ -20,6 +20,7 @@ Contents: --------- 1. CPUFreq core and interfaces 2. CPUFreq notifiers +3. CPUFreq Table Generation with Operating Performance Point (OPP) 1. General Information ======================= @@ -93,6 +94,30 @@ cpu - number of the affected CPU old - old frequency new - new frequency -If the cpufreq core detects the frequency has changed while the system -was suspended, these notifiers are called with CPUFREQ_RESUMECHANGE as -second argument. +3. CPUFreq Table Generation with Operating Performance Point (OPP) +================================================================== +For details about OPP, see Documentation/power/opp.txt + +dev_pm_opp_init_cpufreq_table - cpufreq framework typically is initialized with + cpufreq_frequency_table_cpuinfo which is provided with the list of + frequencies that are available for operation. This function provides + a ready to use conversion routine to translate the OPP layer's internal + information about the available frequencies into a format readily + providable to cpufreq. + + WARNING: Do not use this function in interrupt context. + + Example: + soc_pm_init() + { + /* Do things */ + r = dev_pm_opp_init_cpufreq_table(dev, &freq_table); + if (!r) + cpufreq_frequency_table_cpuinfo(policy, freq_table); + /* Do other things */ + } + + NOTE: This function is available only if CONFIG_CPU_FREQ is enabled in + addition to CONFIG_PM_OPP. + +dev_pm_opp_free_cpufreq_table - Free up the table allocated by dev_pm_opp_init_cpufreq_table diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt index 43c743903dd..14f4e6336d8 100644 --- a/Documentation/cpu-freq/cpu-drivers.txt +++ b/Documentation/cpu-freq/cpu-drivers.txt @@ -23,9 +23,10 @@ Contents: 1.1 Initialization 1.2 Per-CPU Initialization 1.3 verify -1.4 target or setpolicy? -1.5 target +1.4 target/target_index or setpolicy? +1.5 target/target_index 1.6 setpolicy +1.7 get_intermediate and target_intermediate 2. Frequency Table Helpers @@ -50,30 +51,39 @@ What shall this struct cpufreq_driver contain? cpufreq_driver.name - The name of this driver. -cpufreq_driver.owner - THIS_MODULE; - cpufreq_driver.init - A pointer to the per-CPU initialization function. cpufreq_driver.verify - A pointer to a "verification" function. cpufreq_driver.setpolicy _or_ -cpufreq_driver.target - See below on the differences. +cpufreq_driver.target/ +target_index - See below on the differences. And optionally -cpufreq_driver.exit - A pointer to a per-CPU cleanup function. +cpufreq_driver.exit - A pointer to a per-CPU cleanup + function called during CPU_POST_DEAD + phase of cpu hotplug process. + +cpufreq_driver.stop_cpu - A pointer to a per-CPU stop function + called during CPU_DOWN_PREPARE phase of + cpu hotplug process. cpufreq_driver.resume - A pointer to a per-CPU resume function which is called with interrupts disabled and _before_ the pre-suspend frequency and/or policy is restored by a call to - ->target or ->setpolicy. + ->target/target_index or ->setpolicy. cpufreq_driver.attr - A pointer to a NULL-terminated list of "struct freq_attr" which allow to export values to sysfs. +cpufreq_driver.get_intermediate +and target_intermediate Used to switch to stable frequency while + changing CPU frequency. + 1.2 Per-CPU Initialization -------------------------- @@ -92,9 +102,9 @@ policy->cpuinfo.max_freq - the minimum and maximum frequency (in kHz) which is supported by this CPU policy->cpuinfo.transition_latency the time it takes on this CPU to - switch between two frequencies (if - appropriate, else specify - CPUFREQ_ETERNAL) + switch between two frequencies in + nanoseconds (if appropriate, else + specify CPUFREQ_ETERNAL) policy->cur The current operating frequency of this CPU (if appropriate) @@ -105,11 +115,18 @@ policy->governor must contain the "default policy" for this CPU. A few moments later, cpufreq_driver.verify and either cpufreq_driver.setpolicy or - cpufreq_driver.target is called with - these values. + cpufreq_driver.target/target_index is called + with these values. -For setting some of these values, the frequency table helpers might be -helpful. See the section 2 for more information on them. +For setting some of these values (cpuinfo.min[max]_freq, policy->min[max]), the +frequency table helpers might be helpful. See the section 2 for more information +on them. + +SMP systems normally have same clock source for a group of cpus. For these the +.init() would be called only once for the first online cpu. Here the .init() +routine must initialize policy->cpus with mask of all possible cpus (Online + +Offline) that share the clock. Then the core would copy this mask onto +policy->related_cpus and will reset policy->cpus to carry only online cpus. 1.3 verify @@ -128,20 +145,31 @@ range) is within policy->min and policy->max. If necessary, increase policy->max first, and only if this is no solution, decrease policy->min. -1.4 target or setpolicy? +1.4 target/target_index or setpolicy? ---------------------------- Most cpufreq drivers or even most cpu frequency scaling algorithms only allow the CPU to be set to one frequency. For these, you use the -->target call. +->target/target_index call. Some cpufreq-capable processors switch the frequency between certain limits on their own. These shall use the ->setpolicy call -1.4. target +1.5. target/target_index ------------- +The target_index call has two arguments: struct cpufreq_policy *policy, +and unsigned int index (into the exposed frequency table). + +The CPUfreq driver must set the new frequency when called here. The +actual frequency must be determined by freq_table[index].frequency. + +It should always restore to earlier frequency (i.e. policy->restore_freq) in +case of errors, even if we switched to intermediate frequency earlier. + +Deprecated: +---------- The target call has three arguments: struct cpufreq_policy *policy, unsigned int target_frequency, unsigned int relation. @@ -155,11 +183,11 @@ actual frequency must be determined using the following rules: - if relation==CPUFREQ_REL_H, try to select a new_freq lower than or equal target_freq. ("H for highest, but no higher than") -Here again the frequency table helper might assist you - see section 3 +Here again the frequency table helper might assist you - see section 2 for details. -1.5 setpolicy +1.6 setpolicy --------------- The setpolicy call only takes a struct cpufreq_policy *policy as @@ -168,8 +196,25 @@ in-chipset dynamic frequency switching to policy->min, the upper limit to policy->max, and -if supported- select a performance-oriented setting when policy->policy is CPUFREQ_POLICY_PERFORMANCE, and a powersaving-oriented setting when CPUFREQ_POLICY_POWERSAVE. Also check -the reference implementation in arch/i386/kernel/cpu/cpufreq/longrun.c +the reference implementation in drivers/cpufreq/longrun.c + +1.7 get_intermediate and target_intermediate +-------------------------------------------- +Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset. + +get_intermediate should return a stable intermediate frequency platform wants to +switch to, and target_intermediate() should set CPU to to that frequency, before +jumping to the frequency corresponding to 'index'. Core will take care of +sending notifications and driver doesn't have to handle them in +target_intermediate() or target_index(). + +Drivers can return '0' from get_intermediate() in case they don't wish to switch +to intermediate frequency for some target frequency. In that case core will +directly call ->target_index(). + +NOTE: ->target_index() should restore to policy->restore_freq in case of +failures as core would send notifications for that. 2. Frequency Table Helpers @@ -178,10 +223,10 @@ the reference implementation in arch/i386/kernel/cpu/cpufreq/longrun.c As most cpufreq processors only allow for being set to a few specific frequencies, a "frequency table" with some functions might assist in some work of the processor driver. Such a "frequency table" consists -of an array of struct cpufreq_freq_table entries, with any value in -"index" you want to use, and the corresponding frequency in +of an array of struct cpufreq_frequency_table entries, with any value in +"driver_data" you want to use, and the corresponding frequency in "frequency". At the end of the table, you need to add a -cpufreq_freq_table entry with frequency set to CPUFREQ_TABLE_END. And +cpufreq_frequency_table entry with frequency set to CPUFREQ_TABLE_END. And if you want to skip one entry in the table, set the frequency to CPUFREQ_ENTRY_INVALID. The entries don't need to be in ascending order. @@ -207,10 +252,23 @@ int cpufreq_frequency_table_target(struct cpufreq_policy *policy, is the corresponding frequency table helper for the ->target stage. Just pass the values to this function, and the unsigned int index returns the number of the frequency table entry which contains -the frequency the CPU shall be set to. PLEASE NOTE: This is not the -"index" which is in this cpufreq_table_entry.index, but instead -cpufreq_table[index]. So, the new frequency is -cpufreq_table[index].frequency, and the value you stored into the -frequency table "index" field is -cpufreq_table[index].index. +the frequency the CPU shall be set to. + +The following macros can be used as iterators over cpufreq_frequency_table: + +cpufreq_for_each_entry(pos, table) - iterates over all entries of frequency +table. + +cpufreq-for_each_valid_entry(pos, table) - iterates over all entries, +excluding CPUFREQ_ENTRY_INVALID frequencies. +Use arguments "pos" - a cpufreq_frequency_table * as a loop cursor and +"table" - the cpufreq_frequency_table * you want to iterate over. + +For example: + + struct cpufreq_frequency_table *pos, *driver_freq_table; + cpufreq_for_each_entry(pos, driver_freq_table) { + /* Do something with pos */ + pos->frequency = ... + } diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt index ce73f3eb5dd..77ec21574fb 100644 --- a/Documentation/cpu-freq/governors.txt +++ b/Documentation/cpu-freq/governors.txt @@ -40,7 +40,7 @@ Most cpufreq drivers (in fact, all except one, longrun) or even most cpu frequency scaling algorithms only offer the CPU to be set to one frequency. In order to offer dynamic frequency scaling, the cpufreq core must be able to tell these drivers of a "target frequency". So -these specific drivers will be transformed to offer a "->target" +these specific drivers will be transformed to offer a "->target/target_index" call instead of the existing "->setpolicy" call. For "longrun", all stays the same, though. @@ -71,7 +71,7 @@ CPU can be set to switch independently | CPU can only be set / the limits of policy->{min,max} / \ / \ - Using the ->setpolicy call, Using the ->target call, + Using the ->setpolicy call, Using the ->target/target_index call, the limits and the the frequency closest "policy" is set. to target_freq is set. It is assured that it @@ -119,10 +119,6 @@ want the kernel to look at the CPU usage and to make decisions on what to do about the frequency. Typically this is set to values of around '10000' or more. It's default value is (cmp. with users-guide.txt): transition_latency * 1000 -The lowest value you can set is: -transition_latency * 100 or it may get restricted to a value where it -makes not sense for the kernel anymore to poll that often which depends -on your HZ config variable (HZ=1000: max=20000us, HZ=250: max=5000). Be aware that transition latency is in ns and sampling_rate is in us, so you get the same sysfs value by default. Sampling rate should always get adjusted considering the transition latency @@ -131,20 +127,24 @@ in the bash (as said, 1000 is default), do: echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \ >ondemand/sampling_rate -show_sampling_rate_(min|max): THIS INTERFACE IS DEPRECATED, DON'T USE IT. -You can use wider ranges now and the general -cpuinfo_transition_latency variable (cmp. with user-guide.txt) can be -used to obtain exactly the same info: -show_sampling_rate_min = transtition_latency * 500 / 1000 -show_sampling_rate_max = transtition_latency * 500000 / 1000 -(divided by 1000 is to illustrate that sampling rate is in us and -transition latency is exported ns). +sampling_rate_min: +The sampling rate is limited by the HW transition latency: +transition_latency * 100 +Or by kernel restrictions: +If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed. +If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is used, the +limits depend on the CONFIG_HZ option: +HZ=1000: min=20000us (20ms) +HZ=250: min=80000us (80ms) +HZ=100: min=200000us (200ms) +The highest value of kernel and HW latency restrictions is shown and +used as the minimum sampling rate. up_threshold: defines what the average CPU usage between the samplings of 'sampling_rate' needs to be for the kernel to make a decision on whether it should increase the frequency. For example when it is set -to its default value of '80' it means that between the checking -intervals the CPU needs to be on average more than 80% in use to then +to its default value of '95' it means that between the checking +intervals the CPU needs to be on average more than 95% in use to then decide that the CPU frequency needs to be increased. ignore_nice_load: this parameter takes a value of '0' or '1'. When @@ -156,6 +156,38 @@ intensive calculation on your laptop that you do not care how long it takes to complete as you can 'nice' it and prevent it from taking part in the deciding process of whether to increase your CPU frequency. +sampling_down_factor: this parameter controls the rate at which the +kernel makes a decision on when to decrease the frequency while running +at top speed. When set to 1 (the default) decisions to reevaluate load +are made at the same interval regardless of current clock speed. But +when set to greater than 1 (e.g. 100) it acts as a multiplier for the +scheduling interval for reevaluating load when the CPU is at its top +speed due to high load. This improves performance by reducing the overhead +of load evaluation and helping the CPU stay at its top speed when truly +busy, rather than shifting back and forth in speed. This tunable has no +effect on behavior at lower speeds/lower CPU loads. + +powersave_bias: this parameter takes a value between 0 to 1000. It +defines the percentage (times 10) value of the target frequency that +will be shaved off of the target. For example, when set to 100 -- 10%, +when ondemand governor would have targeted 1000 MHz, it will target +1000 MHz - (10% of 1000 MHz) = 900 MHz instead. This is set to 0 +(disabled) by default. +When AMD frequency sensitivity powersave bias driver -- +drivers/cpufreq/amd_freq_sensitivity.c is loaded, this parameter +defines the workload frequency sensitivity threshold in which a lower +frequency is chosen instead of ondemand governor's original target. +The frequency sensitivity is a hardware reported (on AMD Family 16h +Processors and above) value between 0 to 100% that tells software how +the performance of the workload running on a CPU will change when +frequency changes. A workload with sensitivity of 0% (memory/IO-bound) +will not perform any better on higher core frequency, whereas a +workload with sensitivity of 100% (CPU-bound) will perform better +higher the frequency. When the driver is loaded, this is set to 400 +by default -- for CPUs running workloads with sensitivity value below +40%, a lower frequency is chosen. Unloading the driver or writing 0 +will disable this feature. + 2.5 Conservative ---------------- @@ -180,6 +212,12 @@ governor but for the opposite direction. For example when set to its default value of '20' it means that if the CPU usage needs to be below 20% between samples to have the frequency decreased. +sampling_down_factor: similar functionality as in "ondemand" governor. +But in "conservative", it controls the rate at which the kernel makes +a decision on when to decrease the frequency while running in any +speed. Load for frequency increase is still evaluated every +sampling rate. + 3. The Governor Interface in the CPUfreq Core ============================================= diff --git a/Documentation/cpu-freq/index.txt b/Documentation/cpu-freq/index.txt index 3d0b915035b..dc024ab4054 100644 --- a/Documentation/cpu-freq/index.txt +++ b/Documentation/cpu-freq/index.txt @@ -35,8 +35,8 @@ Mailing List ------------ There is a CPU frequency changing CVS commit and general list where you can report bugs, problems or submit patches. To post a message, -send an email to cpufreq@vger.kernel.org, to subscribe go to -http://vger.kernel.org/vger-lists.html#cpufreq and follow the +send an email to linux-pm@vger.kernel.org, to subscribe go to +http://vger.kernel.org/vger-lists.html#linux-pm and follow the instructions there. Links diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt new file mode 100644 index 00000000000..a69ffe1d54d --- /dev/null +++ b/Documentation/cpu-freq/intel-pstate.txt @@ -0,0 +1,43 @@ +Intel P-state driver +-------------------- + +This driver implements a scaling driver with an internal governor for +Intel Core processors. The driver follows the same model as the +Transmeta scaling driver (longrun.c) and implements the setpolicy() +instead of target(). Scaling drivers that implement setpolicy() are +assumed to implement internal governors by the cpufreq core. All the +logic for selecting the current P state is contained within the +driver; no external governor is used by the cpufreq core. + +Intel SandyBridge+ processors are supported. + +New sysfs files for controlling P state selection have been added to +/sys/devices/system/cpu/intel_pstate/ + + max_perf_pct: limits the maximum P state that will be requested by + the driver stated as a percentage of the available performance. The + available (P states) performance may be reduced by the no_turbo + setting described below. + + min_perf_pct: limits the minimum P state that will be requested by + the driver stated as a percentage of the max (non-turbo) + performance level. + + no_turbo: limits the driver to selecting P states below the turbo + frequency range. + +For contemporary Intel processors, the frequency is controlled by the +processor itself and the P-states exposed to software are related to +performance levels. The idea that frequency can be set to a single +frequency is fiction for Intel Core processors. Even if the scaling +driver selects a single P state the actual frequency the processor +will run at is selected by the processor itself. + +New debugfs files have also been added to /sys/kernel/debug/pstate_snb/ + + deadband + d_gain_pct + i_gain_pct + p_gain_pct + sample_rate_ms + setpoint diff --git a/Documentation/cpu-freq/pcc-cpufreq.txt b/Documentation/cpu-freq/pcc-cpufreq.txt new file mode 100644 index 00000000000..9e3c3b33514 --- /dev/null +++ b/Documentation/cpu-freq/pcc-cpufreq.txt @@ -0,0 +1,207 @@ +/* + * pcc-cpufreq.txt - PCC interface documentation + * + * Copyright (C) 2009 Red Hat, Matthew Garrett <mjg@redhat.com> + * Copyright (C) 2009 Hewlett-Packard Development Company, L.P. + * Nagananda Chumbalkar <nagananda.chumbalkar@hp.com> + * + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; version 2 of the License. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or NON + * INFRINGEMENT. See the GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 675 Mass Ave, Cambridge, MA 02139, USA. + * + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + */ + + + Processor Clocking Control Driver + --------------------------------- + +Contents: +--------- +1. Introduction +1.1 PCC interface +1.1.1 Get Average Frequency +1.1.2 Set Desired Frequency +1.2 Platforms affected +2. Driver and /sys details +2.1 scaling_available_frequencies +2.2 cpuinfo_transition_latency +2.3 cpuinfo_cur_freq +2.4 related_cpus +3. Caveats + +1. Introduction: +---------------- +Processor Clocking Control (PCC) is an interface between the platform +firmware and OSPM. It is a mechanism for coordinating processor +performance (ie: frequency) between the platform firmware and the OS. + +The PCC driver (pcc-cpufreq) allows OSPM to take advantage of the PCC +interface. + +OS utilizes the PCC interface to inform platform firmware what frequency the +OS wants for a logical processor. The platform firmware attempts to achieve +the requested frequency. If the request for the target frequency could not be +satisfied by platform firmware, then it usually means that power budget +conditions are in place, and "power capping" is taking place. + +1.1 PCC interface: +------------------ +The complete PCC specification is available here: +http://www.acpica.org/download/Processor-Clocking-Control-v1p0.pdf + +PCC relies on a shared memory region that provides a channel for communication +between the OS and platform firmware. PCC also implements a "doorbell" that +is used by the OS to inform the platform firmware that a command has been +sent. + +The ACPI PCCH() method is used to discover the location of the PCC shared +memory region. The shared memory region header contains the "command" and +"status" interface. PCCH() also contains details on how to access the platform +doorbell. + +The following commands are supported by the PCC interface: +* Get Average Frequency +* Set Desired Frequency + +The ACPI PCCP() method is implemented for each logical processor and is +used to discover the offsets for the input and output buffers in the shared +memory region. + +When PCC mode is enabled, the platform will not expose processor performance +or throttle states (_PSS, _TSS and related ACPI objects) to OSPM. Therefore, +the native P-state driver (such as acpi-cpufreq for Intel, powernow-k8 for +AMD) will not load. + +However, OSPM remains in control of policy. The governor (eg: "ondemand") +computes the required performance for each processor based on server workload. +The PCC driver fills in the command interface, and the input buffer and +communicates the request to the platform firmware. The platform firmware is +responsible for delivering the requested performance. + +Each PCC command is "global" in scope and can affect all the logical CPUs in +the system. Therefore, PCC is capable of performing "group" updates. With PCC +the OS is capable of getting/setting the frequency of all the logical CPUs in +the system with a single call to the BIOS. + +1.1.1 Get Average Frequency: +---------------------------- +This command is used by the OSPM to query the running frequency of the +processor since the last time this command was completed. The output buffer +indicates the average unhalted frequency of the logical processor expressed as +a percentage of the nominal (ie: maximum) CPU frequency. The output buffer +also signifies if the CPU frequency is limited by a power budget condition. + +1.1.2 Set Desired Frequency: +---------------------------- +This command is used by the OSPM to communicate to the platform firmware the +desired frequency for a logical processor. The output buffer is currently +ignored by OSPM. The next invocation of "Get Average Frequency" will inform +OSPM if the desired frequency was achieved or not. + +1.2 Platforms affected: +----------------------- +The PCC driver will load on any system where the platform firmware: +* supports the PCC interface, and the associated PCCH() and PCCP() methods +* assumes responsibility for managing the hardware clocking controls in order +to deliver the requested processor performance + +Currently, certain HP ProLiant platforms implement the PCC interface. On those +platforms PCC is the "default" choice. + +However, it is possible to disable this interface via a BIOS setting. In +such an instance, as is also the case on platforms where the PCC interface +is not implemented, the PCC driver will fail to load silently. + +2. Driver and /sys details: +--------------------------- +When the driver loads, it merely prints the lowest and the highest CPU +frequencies supported by the platform firmware. + +The PCC driver loads with a message such as: +pcc-cpufreq: (v1.00.00) driver loaded with frequency limits: 1600 MHz, 2933 +MHz + +This means that the OPSM can request the CPU to run at any frequency in +between the limits (1600 MHz, and 2933 MHz) specified in the message. + +Internally, there is no need for the driver to convert the "target" frequency +to a corresponding P-state. + +The VERSION number for the driver will be of the format v.xy.ab. +eg: 1.00.02 + ----- -- + | | + | -- this will increase with bug fixes/enhancements to the driver + |-- this is the version of the PCC specification the driver adheres to + + +The following is a brief discussion on some of the fields exported via the +/sys filesystem and how their values are affected by the PCC driver: + +2.1 scaling_available_frequencies: +---------------------------------- +scaling_available_frequencies is not created in /sys. No intermediate +frequencies need to be listed because the BIOS will try to achieve any +frequency, within limits, requested by the governor. A frequency does not have +to be strictly associated with a P-state. + +2.2 cpuinfo_transition_latency: +------------------------------- +The cpuinfo_transition_latency field is 0. The PCC specification does +not include a field to expose this value currently. + +2.3 cpuinfo_cur_freq: +--------------------- +A) Often cpuinfo_cur_freq will show a value different than what is declared +in the scaling_available_frequencies or scaling_cur_freq, or scaling_max_freq. +This is due to "turbo boost" available on recent Intel processors. If certain +conditions are met the BIOS can achieve a slightly higher speed than requested +by OSPM. An example: + +scaling_cur_freq : 2933000 +cpuinfo_cur_freq : 3196000 + +B) There is a round-off error associated with the cpuinfo_cur_freq value. +Since the driver obtains the current frequency as a "percentage" (%) of the +nominal frequency from the BIOS, sometimes, the values displayed by +scaling_cur_freq and cpuinfo_cur_freq may not match. An example: + +scaling_cur_freq : 1600000 +cpuinfo_cur_freq : 1583000 + +In this example, the nominal frequency is 2933 MHz. The driver obtains the +current frequency, cpuinfo_cur_freq, as 54% of the nominal frequency: + + 54% of 2933 MHz = 1583 MHz + +Nominal frequency is the maximum frequency of the processor, and it usually +corresponds to the frequency of the P0 P-state. + +2.4 related_cpus: +----------------- +The related_cpus field is identical to affected_cpus. + +affected_cpus : 4 +related_cpus : 4 + +Currently, the PCC driver does not evaluate _PSD. The platforms that support +PCC do not implement SW_ALL. So OSPM doesn't need to perform any coordination +to ensure that the same frequency is requested of all dependent CPUs. + +3. Caveats: +----------- +The "cpufreq_stats" module in its present form cannot be loaded and +expected to work with the PCC driver. Since the "cpufreq_stats" module +provides information wrt each P-state, it is not applicable to the PCC driver. diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt index 75f41193f3e..ff2f28332cc 100644 --- a/Documentation/cpu-freq/user-guide.txt +++ b/Documentation/cpu-freq/user-guide.txt @@ -31,7 +31,6 @@ Contents: 3. How to change the CPU cpufreq policy and/or speed 3.1 Preferred interface: sysfs -3.2 Deprecated interfaces @@ -177,7 +176,9 @@ scaling_governor, and by "echoing" the name of another work on some specific architectures or processors. -cpuinfo_cur_freq : Current speed of the CPU, in KHz. +cpuinfo_cur_freq : Current frequency of the CPU as obtained from + the hardware, in KHz. This is the frequency + the CPU actually runs at. scaling_available_frequencies : List of available frequencies, in KHz. @@ -189,15 +190,29 @@ scaling_max_freq show the current "policy limits" (in first set scaling_max_freq, then scaling_min_freq. -affected_cpus : List of CPUs that require software coordination - of frequency. +affected_cpus : List of Online CPUs that require software + coordination of frequency. -related_cpus : List of CPUs that need some sort of frequency - coordination, whether software or hardware. +related_cpus : List of Online + Offline CPUs that need software + coordination of frequency. scaling_driver : Hardware driver for cpufreq. -scaling_cur_freq : Current frequency of the CPU, in KHz. +scaling_cur_freq : Current frequency of the CPU as determined by + the governor and cpufreq core, in KHz. This is + the frequency the kernel thinks the CPU runs + at. + +bios_limit : If the BIOS tells the OS to limit a CPU to + lower frequencies, the user can read out the + maximum available frequency from this file. + This typically can happen through (often not + intended) BIOS settings, restrictions + triggered through a service processor or other + BIOS/HW based implementations. + This does not cover thermal ACPI limitations + which can be detected through the generic + thermal driver. If you have selected the "userspace" governor which allows you to set the CPU operating frequency to a specific value, you can read out |
