diff options
Diffstat (limited to 'Documentation/cpu-freq')
| -rw-r--r-- | Documentation/cpu-freq/boost.txt | 93 | ||||
| -rw-r--r-- | Documentation/cpu-freq/core.txt | 31 | ||||
| -rw-r--r-- | Documentation/cpu-freq/cpu-drivers.txt | 108 | ||||
| -rw-r--r-- | Documentation/cpu-freq/governors.txt | 50 | ||||
| -rw-r--r-- | Documentation/cpu-freq/index.txt | 4 | ||||
| -rw-r--r-- | Documentation/cpu-freq/intel-pstate.txt | 43 | ||||
| -rw-r--r-- | Documentation/cpu-freq/user-guide.txt | 8 |
7 files changed, 296 insertions, 41 deletions
diff --git a/Documentation/cpu-freq/boost.txt b/Documentation/cpu-freq/boost.txt new file mode 100644 index 00000000000..dd62e1334f0 --- /dev/null +++ b/Documentation/cpu-freq/boost.txt @@ -0,0 +1,93 @@ +Processor boosting control + + - information for users - + +Quick guide for the impatient: +-------------------- +/sys/devices/system/cpu/cpufreq/boost +controls the boost setting for the whole system. You can read and write +that file with either "0" (boosting disabled) or "1" (boosting allowed). +Reading or writing 1 does not mean that the system is boosting at this +very moment, but only that the CPU _may_ raise the frequency at it's +discretion. +-------------------- + +Introduction +------------- +Some CPUs support a functionality to raise the operating frequency of +some cores in a multi-core package if certain conditions apply, mostly +if the whole chip is not fully utilized and below it's intended thermal +budget. The decision about boost disable/enable is made either at hardware +(e.g. x86) or software (e.g ARM). +On Intel CPUs this is called "Turbo Boost", AMD calls it "Turbo-Core", +in technical documentation "Core performance boost". In Linux we use +the term "boost" for convenience. + +Rationale for disable switch +---------------------------- + +Though the idea is to just give better performance without any user +intervention, sometimes the need arises to disable this functionality. +Most systems offer a switch in the (BIOS) firmware to disable the +functionality at all, but a more fine-grained and dynamic control would +be desirable: +1. While running benchmarks, reproducible results are important. Since + the boosting functionality depends on the load of the whole package, + single thread performance can vary. By explicitly disabling the boost + functionality at least for the benchmark's run-time the system will run + at a fixed frequency and results are reproducible again. +2. To examine the impact of the boosting functionality it is helpful + to do tests with and without boosting. +3. Boosting means overclocking the processor, though under controlled + conditions. By raising the frequency and the voltage the processor + will consume more power than without the boosting, which may be + undesirable for instance for mobile users. Disabling boosting may + save power here, though this depends on the workload. + + +User controlled switch +---------------------- + +To allow the user to toggle the boosting functionality, the cpufreq core +driver exports a sysfs knob to enable or disable it. There is a file: +/sys/devices/system/cpu/cpufreq/boost +which can either read "0" (boosting disabled) or "1" (boosting enabled). +The file is exported only when cpufreq driver supports boosting. +Explicitly changing the permissions and writing to that file anyway will +return EINVAL. + +On supported CPUs one can write either a "0" or a "1" into this file. +This will either disable the boost functionality on all cores in the +whole system (0) or will allow the software or hardware to boost at will +(1). + +Writing a "1" does not explicitly boost the system, but just allows the +CPU to boost at their discretion. Some implementations take external +factors like the chip's temperature into account, so boosting once does +not necessarily mean that it will occur every time even using the exact +same software setup. + + +AMD legacy cpb switch +--------------------- +The AMD powernow-k8 driver used to support a very similar switch to +disable or enable the "Core Performance Boost" feature of some AMD CPUs. +This switch was instantiated in each CPU's cpufreq directory +(/sys/devices/system/cpu[0-9]*/cpufreq) and was called "cpb". +Though the per CPU existence hints at a more fine grained control, the +actual implementation only supported a system-global switch semantics, +which was simply reflected into each CPU's file. Writing a 0 or 1 into it +would pull the other CPUs to the same state. +For compatibility reasons this file and its behavior is still supported +on AMD CPUs, though it is now protected by a config switch +(X86_ACPI_CPUFREQ_CPB). On Intel CPUs this file will never be created, +even with the config option set. +This functionality is considered legacy and will be removed in some future +kernel version. + +More fine grained boosting control +---------------------------------- + +Technically it is possible to switch the boosting functionality at least +on a per package basis, for some CPUs even per core. Currently the driver +does not support it, but this may be implemented in the future. diff --git a/Documentation/cpu-freq/core.txt b/Documentation/cpu-freq/core.txt index ce0666e5103..70933eadc30 100644 --- a/Documentation/cpu-freq/core.txt +++ b/Documentation/cpu-freq/core.txt @@ -20,6 +20,7 @@ Contents: --------- 1. CPUFreq core and interfaces 2. CPUFreq notifiers +3. CPUFreq Table Generation with Operating Performance Point (OPP) 1. General Information ======================= @@ -93,6 +94,30 @@ cpu - number of the affected CPU old - old frequency new - new frequency -If the cpufreq core detects the frequency has changed while the system -was suspended, these notifiers are called with CPUFREQ_RESUMECHANGE as -second argument. +3. CPUFreq Table Generation with Operating Performance Point (OPP) +================================================================== +For details about OPP, see Documentation/power/opp.txt + +dev_pm_opp_init_cpufreq_table - cpufreq framework typically is initialized with + cpufreq_frequency_table_cpuinfo which is provided with the list of + frequencies that are available for operation. This function provides + a ready to use conversion routine to translate the OPP layer's internal + information about the available frequencies into a format readily + providable to cpufreq. + + WARNING: Do not use this function in interrupt context. + + Example: + soc_pm_init() + { + /* Do things */ + r = dev_pm_opp_init_cpufreq_table(dev, &freq_table); + if (!r) + cpufreq_frequency_table_cpuinfo(policy, freq_table); + /* Do other things */ + } + + NOTE: This function is available only if CONFIG_CPU_FREQ is enabled in + addition to CONFIG_PM_OPP. + +dev_pm_opp_free_cpufreq_table - Free up the table allocated by dev_pm_opp_init_cpufreq_table diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt index 6c30e930c12..14f4e6336d8 100644 --- a/Documentation/cpu-freq/cpu-drivers.txt +++ b/Documentation/cpu-freq/cpu-drivers.txt @@ -23,9 +23,10 @@ Contents: 1.1 Initialization 1.2 Per-CPU Initialization 1.3 verify -1.4 target or setpolicy? -1.5 target +1.4 target/target_index or setpolicy? +1.5 target/target_index 1.6 setpolicy +1.7 get_intermediate and target_intermediate 2. Frequency Table Helpers @@ -50,30 +51,39 @@ What shall this struct cpufreq_driver contain? cpufreq_driver.name - The name of this driver. -cpufreq_driver.owner - THIS_MODULE; - cpufreq_driver.init - A pointer to the per-CPU initialization function. cpufreq_driver.verify - A pointer to a "verification" function. cpufreq_driver.setpolicy _or_ -cpufreq_driver.target - See below on the differences. +cpufreq_driver.target/ +target_index - See below on the differences. And optionally -cpufreq_driver.exit - A pointer to a per-CPU cleanup function. +cpufreq_driver.exit - A pointer to a per-CPU cleanup + function called during CPU_POST_DEAD + phase of cpu hotplug process. + +cpufreq_driver.stop_cpu - A pointer to a per-CPU stop function + called during CPU_DOWN_PREPARE phase of + cpu hotplug process. cpufreq_driver.resume - A pointer to a per-CPU resume function which is called with interrupts disabled and _before_ the pre-suspend frequency and/or policy is restored by a call to - ->target or ->setpolicy. + ->target/target_index or ->setpolicy. cpufreq_driver.attr - A pointer to a NULL-terminated list of "struct freq_attr" which allow to export values to sysfs. +cpufreq_driver.get_intermediate +and target_intermediate Used to switch to stable frequency while + changing CPU frequency. + 1.2 Per-CPU Initialization -------------------------- @@ -105,11 +115,18 @@ policy->governor must contain the "default policy" for this CPU. A few moments later, cpufreq_driver.verify and either cpufreq_driver.setpolicy or - cpufreq_driver.target is called with - these values. + cpufreq_driver.target/target_index is called + with these values. -For setting some of these values, the frequency table helpers might be -helpful. See the section 2 for more information on them. +For setting some of these values (cpuinfo.min[max]_freq, policy->min[max]), the +frequency table helpers might be helpful. See the section 2 for more information +on them. + +SMP systems normally have same clock source for a group of cpus. For these the +.init() would be called only once for the first online cpu. Here the .init() +routine must initialize policy->cpus with mask of all possible cpus (Online + +Offline) that share the clock. Then the core would copy this mask onto +policy->related_cpus and will reset policy->cpus to carry only online cpus. 1.3 verify @@ -128,20 +145,31 @@ range) is within policy->min and policy->max. If necessary, increase policy->max first, and only if this is no solution, decrease policy->min. -1.4 target or setpolicy? +1.4 target/target_index or setpolicy? ---------------------------- Most cpufreq drivers or even most cpu frequency scaling algorithms only allow the CPU to be set to one frequency. For these, you use the -->target call. +->target/target_index call. Some cpufreq-capable processors switch the frequency between certain limits on their own. These shall use the ->setpolicy call -1.4. target +1.5. target/target_index ------------- +The target_index call has two arguments: struct cpufreq_policy *policy, +and unsigned int index (into the exposed frequency table). + +The CPUfreq driver must set the new frequency when called here. The +actual frequency must be determined by freq_table[index].frequency. + +It should always restore to earlier frequency (i.e. policy->restore_freq) in +case of errors, even if we switched to intermediate frequency earlier. + +Deprecated: +---------- The target call has three arguments: struct cpufreq_policy *policy, unsigned int target_frequency, unsigned int relation. @@ -159,7 +187,7 @@ Here again the frequency table helper might assist you - see section 2 for details. -1.5 setpolicy +1.6 setpolicy --------------- The setpolicy call only takes a struct cpufreq_policy *policy as @@ -168,8 +196,25 @@ in-chipset dynamic frequency switching to policy->min, the upper limit to policy->max, and -if supported- select a performance-oriented setting when policy->policy is CPUFREQ_POLICY_PERFORMANCE, and a powersaving-oriented setting when CPUFREQ_POLICY_POWERSAVE. Also check -the reference implementation in arch/i386/kernel/cpu/cpufreq/longrun.c +the reference implementation in drivers/cpufreq/longrun.c + +1.7 get_intermediate and target_intermediate +-------------------------------------------- +Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset. + +get_intermediate should return a stable intermediate frequency platform wants to +switch to, and target_intermediate() should set CPU to to that frequency, before +jumping to the frequency corresponding to 'index'. Core will take care of +sending notifications and driver doesn't have to handle them in +target_intermediate() or target_index(). + +Drivers can return '0' from get_intermediate() in case they don't wish to switch +to intermediate frequency for some target frequency. In that case core will +directly call ->target_index(). + +NOTE: ->target_index() should restore to policy->restore_freq in case of +failures as core would send notifications for that. 2. Frequency Table Helpers @@ -178,10 +223,10 @@ the reference implementation in arch/i386/kernel/cpu/cpufreq/longrun.c As most cpufreq processors only allow for being set to a few specific frequencies, a "frequency table" with some functions might assist in some work of the processor driver. Such a "frequency table" consists -of an array of struct cpufreq_freq_table entries, with any value in -"index" you want to use, and the corresponding frequency in +of an array of struct cpufreq_frequency_table entries, with any value in +"driver_data" you want to use, and the corresponding frequency in "frequency". At the end of the table, you need to add a -cpufreq_freq_table entry with frequency set to CPUFREQ_TABLE_END. And +cpufreq_frequency_table entry with frequency set to CPUFREQ_TABLE_END. And if you want to skip one entry in the table, set the frequency to CPUFREQ_ENTRY_INVALID. The entries don't need to be in ascending order. @@ -207,10 +252,23 @@ int cpufreq_frequency_table_target(struct cpufreq_policy *policy, is the corresponding frequency table helper for the ->target stage. Just pass the values to this function, and the unsigned int index returns the number of the frequency table entry which contains -the frequency the CPU shall be set to. PLEASE NOTE: This is not the -"index" which is in this cpufreq_table_entry.index, but instead -cpufreq_table[index]. So, the new frequency is -cpufreq_table[index].frequency, and the value you stored into the -frequency table "index" field is -cpufreq_table[index].index. +the frequency the CPU shall be set to. + +The following macros can be used as iterators over cpufreq_frequency_table: + +cpufreq_for_each_entry(pos, table) - iterates over all entries of frequency +table. + +cpufreq-for_each_valid_entry(pos, table) - iterates over all entries, +excluding CPUFREQ_ENTRY_INVALID frequencies. +Use arguments "pos" - a cpufreq_frequency_table * as a loop cursor and +"table" - the cpufreq_frequency_table * you want to iterate over. + +For example: + + struct cpufreq_frequency_table *pos, *driver_freq_table; + cpufreq_for_each_entry(pos, driver_freq_table) { + /* Do something with pos */ + pos->frequency = ... + } diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt index 737988fca64..77ec21574fb 100644 --- a/Documentation/cpu-freq/governors.txt +++ b/Documentation/cpu-freq/governors.txt @@ -40,7 +40,7 @@ Most cpufreq drivers (in fact, all except one, longrun) or even most cpu frequency scaling algorithms only offer the CPU to be set to one frequency. In order to offer dynamic frequency scaling, the cpufreq core must be able to tell these drivers of a "target frequency". So -these specific drivers will be transformed to offer a "->target" +these specific drivers will be transformed to offer a "->target/target_index" call instead of the existing "->setpolicy" call. For "longrun", all stays the same, though. @@ -71,7 +71,7 @@ CPU can be set to switch independently | CPU can only be set / the limits of policy->{min,max} / \ / \ - Using the ->setpolicy call, Using the ->target call, + Using the ->setpolicy call, Using the ->target/target_index call, the limits and the the frequency closest "policy" is set. to target_freq is set. It is assured that it @@ -127,12 +127,12 @@ in the bash (as said, 1000 is default), do: echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \ >ondemand/sampling_rate -show_sampling_rate_min: +sampling_rate_min: The sampling rate is limited by the HW transition latency: transition_latency * 100 Or by kernel restrictions: -If CONFIG_NO_HZ is set, the limit is 10ms fixed. -If CONFIG_NO_HZ is not set or no_hz=off boot parameter is used, the +If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed. +If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is used, the limits depend on the CONFIG_HZ option: HZ=1000: min=20000us (20ms) HZ=250: min=80000us (80ms) @@ -140,8 +140,6 @@ HZ=100: min=200000us (200ms) The highest value of kernel and HW latency restrictions is shown and used as the minimum sampling rate. -show_sampling_rate_max: THIS INTERFACE IS DEPRECATED, DON'T USE IT. - up_threshold: defines what the average CPU usage between the samplings of 'sampling_rate' needs to be for the kernel to make a decision on whether it should increase the frequency. For example when it is set @@ -158,6 +156,38 @@ intensive calculation on your laptop that you do not care how long it takes to complete as you can 'nice' it and prevent it from taking part in the deciding process of whether to increase your CPU frequency. +sampling_down_factor: this parameter controls the rate at which the +kernel makes a decision on when to decrease the frequency while running +at top speed. When set to 1 (the default) decisions to reevaluate load +are made at the same interval regardless of current clock speed. But +when set to greater than 1 (e.g. 100) it acts as a multiplier for the +scheduling interval for reevaluating load when the CPU is at its top +speed due to high load. This improves performance by reducing the overhead +of load evaluation and helping the CPU stay at its top speed when truly +busy, rather than shifting back and forth in speed. This tunable has no +effect on behavior at lower speeds/lower CPU loads. + +powersave_bias: this parameter takes a value between 0 to 1000. It +defines the percentage (times 10) value of the target frequency that +will be shaved off of the target. For example, when set to 100 -- 10%, +when ondemand governor would have targeted 1000 MHz, it will target +1000 MHz - (10% of 1000 MHz) = 900 MHz instead. This is set to 0 +(disabled) by default. +When AMD frequency sensitivity powersave bias driver -- +drivers/cpufreq/amd_freq_sensitivity.c is loaded, this parameter +defines the workload frequency sensitivity threshold in which a lower +frequency is chosen instead of ondemand governor's original target. +The frequency sensitivity is a hardware reported (on AMD Family 16h +Processors and above) value between 0 to 100% that tells software how +the performance of the workload running on a CPU will change when +frequency changes. A workload with sensitivity of 0% (memory/IO-bound) +will not perform any better on higher core frequency, whereas a +workload with sensitivity of 100% (CPU-bound) will perform better +higher the frequency. When the driver is loaded, this is set to 400 +by default -- for CPUs running workloads with sensitivity value below +40%, a lower frequency is chosen. Unloading the driver or writing 0 +will disable this feature. + 2.5 Conservative ---------------- @@ -182,6 +212,12 @@ governor but for the opposite direction. For example when set to its default value of '20' it means that if the CPU usage needs to be below 20% between samples to have the frequency decreased. +sampling_down_factor: similar functionality as in "ondemand" governor. +But in "conservative", it controls the rate at which the kernel makes +a decision on when to decrease the frequency while running in any +speed. Load for frequency increase is still evaluated every +sampling rate. + 3. The Governor Interface in the CPUfreq Core ============================================= diff --git a/Documentation/cpu-freq/index.txt b/Documentation/cpu-freq/index.txt index 3d0b915035b..dc024ab4054 100644 --- a/Documentation/cpu-freq/index.txt +++ b/Documentation/cpu-freq/index.txt @@ -35,8 +35,8 @@ Mailing List ------------ There is a CPU frequency changing CVS commit and general list where you can report bugs, problems or submit patches. To post a message, -send an email to cpufreq@vger.kernel.org, to subscribe go to -http://vger.kernel.org/vger-lists.html#cpufreq and follow the +send an email to linux-pm@vger.kernel.org, to subscribe go to +http://vger.kernel.org/vger-lists.html#linux-pm and follow the instructions there. Links diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt new file mode 100644 index 00000000000..a69ffe1d54d --- /dev/null +++ b/Documentation/cpu-freq/intel-pstate.txt @@ -0,0 +1,43 @@ +Intel P-state driver +-------------------- + +This driver implements a scaling driver with an internal governor for +Intel Core processors. The driver follows the same model as the +Transmeta scaling driver (longrun.c) and implements the setpolicy() +instead of target(). Scaling drivers that implement setpolicy() are +assumed to implement internal governors by the cpufreq core. All the +logic for selecting the current P state is contained within the +driver; no external governor is used by the cpufreq core. + +Intel SandyBridge+ processors are supported. + +New sysfs files for controlling P state selection have been added to +/sys/devices/system/cpu/intel_pstate/ + + max_perf_pct: limits the maximum P state that will be requested by + the driver stated as a percentage of the available performance. The + available (P states) performance may be reduced by the no_turbo + setting described below. + + min_perf_pct: limits the minimum P state that will be requested by + the driver stated as a percentage of the max (non-turbo) + performance level. + + no_turbo: limits the driver to selecting P states below the turbo + frequency range. + +For contemporary Intel processors, the frequency is controlled by the +processor itself and the P-states exposed to software are related to +performance levels. The idea that frequency can be set to a single +frequency is fiction for Intel Core processors. Even if the scaling +driver selects a single P state the actual frequency the processor +will run at is selected by the processor itself. + +New debugfs files have also been added to /sys/kernel/debug/pstate_snb/ + + deadband + d_gain_pct + i_gain_pct + p_gain_pct + sample_rate_ms + setpoint diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt index 04f6b32993e..ff2f28332cc 100644 --- a/Documentation/cpu-freq/user-guide.txt +++ b/Documentation/cpu-freq/user-guide.txt @@ -190,11 +190,11 @@ scaling_max_freq show the current "policy limits" (in first set scaling_max_freq, then scaling_min_freq. -affected_cpus : List of CPUs that require software coordination - of frequency. +affected_cpus : List of Online CPUs that require software + coordination of frequency. -related_cpus : List of CPUs that need some sort of frequency - coordination, whether software or hardware. +related_cpus : List of Online + Offline CPUs that need software + coordination of frequency. scaling_driver : Hardware driver for cpufreq. |
