aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/tsc.c
AgeCommit message (Collapse)Author
2012-04-13sched/x86: Fix overflow in cyc2ns_offsetSalman Qazi
commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 upstream. When a machine boots up, the TSC generally gets reset. However, when kexec is used to boot into a kernel, the TSC value would be carried over from the previous kernel. The computation of cycns_offset in set_cyc2ns_scale is prone to an overflow, if the machine has been up more than 208 days prior to the kexec. The overflow happens when we multiply *scale, even though there is enough room to store the final answer. We fix this issue by decomposing tsc_now into the quotient and remainder of division by CYC2NS_SCALE_FACTOR and then performing the multiplication separately on the two components. Refactor code to share the calculation with the previous fix in __cycles_2_ns(). Signed-off-by: Salman Qazi <sqazi@google.com> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Turner <pjt@google.com> Cc: john stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20120310004027.19291.88460.stgit@dungbeetle.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02x86, tsc: Skip refined tsc calibration on systems with reliable TSCAlok Kataria
commit 57779dc2b3b75bee05ef5d1ada47f615f7a13932 upstream. While running the latest Linux as guest under VMware in highly over-committed situations, we have seen cases when the refined TSC algorithm fails to get a valid tsc_start value in tsc_refine_calibration_work from multiple attempts. As a result the kernel keeps on scheduling the tsc_irqwork task for later. Subsequently after several attempts when it gets a valid start value it goes through the refined calibration and either bails out or uses the new results. Given that the kernel originally read the TSC frequency from the platform, which is the best it can get, I don't think there is much value in refining it. So for systems which get the TSC frequency from the platform we should skip the refined tsc algorithm. We can use the TSC_RELIABLE cpu cap flag to detect this, right now it is set only on VMware and for Moorestown Penwell both of which have there own TSC calibration methods. Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Dirk Brandewie <dirk.brandewie@gmail.com> Cc: Alan Cox <alan@linux.intel.com> [jstultz: Reworked to simply not schedule the refining work, rather then scheduling the work and bombing out later] Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2011-07-23Merge branches 'x86-detect-hyper-for-linus', 'x86-fpu-for-linus', ↵Linus Torvalds
'x86-kexec-for-linus', 'x86-platform-for-linus', 'x86-quirks-for-linus', 'x86-tsc-for-linus' and 'x86-smpboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-detect-hyper-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hyper: Change hypervisor detection order * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, fpu: Fix DNA exception during check_fpu() * 'x86-kexec-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: kexec, x86: Fix incorrect jump back address if not preserving context * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, config: Introduce an INTEL_MID configuration * 'x86-quirks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, quirks: Use pci_dev->revision * 'x86-tsc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: tsc: Remove unneeded DMI-based blacklisting * 'x86-smpboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, boot: Wait for boot cpu to show up if nr_cpus limit is about to hit
2011-07-14x86-64: Move vread_tsc and vread_hpet into the vDSOAndy Lutomirski
The vsyscall page now consists entirely of trap instructions. Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/637648f303f2ef93af93bae25186e9a1bea093f5.1310639973.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13clocksource: Replace vread with generic arch dataAndy Lutomirski
The vread field was bloating struct clocksource everywhere except x86_64, and I want to change the way this works on x86_64, so let's split it out into per-arch data. Cc: x86@kernel.org Cc: Clemens Ladisch <clemens@ladisch.de> Cc: linux-ia64@vger.kernel.org Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/3ae5ec76a168eaaae63f08a2a1060b91aa0b7759.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-05-31x86: tsc: Remove unneeded DMI-based blacklistingTero Roponen
The blacklist was added in response to my bug report (http://lkml.org/lkml/2006/1/19/362) and has never contained more than the one entry describing my old now dead ThinkPad 380XD laptop. As found out later (http://lkml.org/lkml/2007/11/29/50), this special treatment has been unnecessary for a long time, so it can be removed. Signed-off-by: Tero Roponen <tero.roponen@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-05-24x86-64: Move vread_tsc into a new file with sensible optionsAndy Lutomirski
vread_tsc is short and hot, and it's userspace code so the usual reasons to enable -pg and turn off sibling calls don't apply. (OK, turning off sibling calls has no effect. But it might someday...) As an added benefit, tsc.c is profilable now. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@amd64.org> Link: http://lkml.kernel.org/r/%3C99c6d7f5efa3ccb65b4ac6eb443e1ab7bad47d7b.1306156808.git.luto%40mit.edu%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24x86-64: Don't generate cmov in vread_tscAndy Lutomirski
vread_tsc checks whether rdtsc returns something less than cycle_last, which is an extremely predictable branch. GCC likes to generate a cmov anyway, which is several cycles slower than a predicted branch. This saves a couple of nanoseconds. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@amd64.org> Link: http://lkml.kernel.org/r/%3C561280649519de41352fcb620684dfb22bad6bac.1306156808.git.luto%40mit.edu%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24x86-64: Remove unnecessary barrier in vread_tscAndy Lutomirski
RDTSC is completely unordered on modern Intel and AMD CPUs. The Intel manual says that lfence;rdtsc causes all previous instructions to complete before the tsc is read, and the AMD manual says to use mfence;rdtsc to do the same thing. From a decent amount of testing [1] this is enough to make rdtsc be ordered with respect to subsequent loads across a wide variety of CPUs. On Sandy Bridge (i7-2600), this improves a loop of clock_gettime(CLOCK_MONOTONIC) by more than 5 ns/iter. [1] https://lkml.org/lkml/2011/4/18/350 Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@amd64.org> Link: http://lkml.kernel.org/r/%3C1c158b9d74338aa5361f96dd473d0e6a58235302.1306156808.git.luto%40mit.edu%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24x86-64: Clean up vdso/kernel shared variablesAndy Lutomirski
Variables that are shared between the vdso and the kernel are currently a bit of a mess. They are each defined with their own magic, they are accessed differently in the kernel, the vsyscall page, and the vdso, and one of them (vsyscall_clock) doesn't even really exist. This changes them all to use a common mechanism. All of them are delcared in vvar.h with a fixed address (validated by the linker script). In the kernel (as before), they look like ordinary read-write variables. In the vsyscall page and the vdso, they are accessed through a new macro VVAR, which gives read-only access. The vdso is now loaded verbatim into memory without any fixups. As a side bonus, access from the vdso is faster because a level of indirection is removed. While we're at it, pack jiffies and vgetcpu_mode into the same cacheline. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@amd64.org> Link: http://lkml.kernel.org/r/%3C7357882fbb51fa30491636a7b6528747301b7ee9.1306156808.git.luto%40mit.edu%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-18x86: Fix common misspellingsLucas De Marchi
They were generated by 'codespell' and then manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: trivial@kernel.org LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-15Merge branches 'core-fixes-for-linus', 'x86-fixes-for-linus', ↵Linus Torvalds
'timers-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: avoid pointless blocked-task warnings rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi() * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, olpc: Add missing Kconfig dependencies x86, mrst: Set correct APB timer IRQ affinity for secondary cpu x86: tsc: Fix calibration refinement conditionals to avoid divide by zero x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timekeeping: Make local variables static time: Rename misnamed minsec argument of clocks_calc_mult_shift() * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Remove syscall_exit_fields tracing: Only process module tracepoints once perf record: Add "nodelay" mode, disabled by default perf sched: Fix list of events, dropping unsupported ':r' modifier Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return" perf top: Fix annotate segv perf evsel: Fix order of event list deletion
2011-01-14x86: tsc: Fix calibration refinement conditionals to avoid divide by zeroJohn Stultz
Konrad Wilk reported that the new delayed calibration crashes with a divide by zero on Xen. The reason is that Xen sets the pmtimer address, but reading from it returns 0xffffff. That results in the ref_start and ref_stop value being the same, so the delta is zero which causes the divide by zero later in the calculation. The conditional (!hpet && !ref_start && !ref_stop) which sanity checks the calibration reference values doesn't really make sense. If the refs are null, but hpet is on, we still want to break out. The div by zero would be possible to trigger by chance if both reads from the hardware provided the exact same value (due to hardware wrapping). So checking if both the ref values are the same should handle if we don't have hardware (both null) or if they are the same value (either by invalid hardware, or by chance), avoiding the div by zero issue. [ tglx: Applied the same fix to native_calibrate_tsc() where this check was copied from ] Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1295024788-15619-1-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-01-11Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix Moorestown VRTC fixmap placement x86/gpio: Implement x86 gpio_to_irq convert function x86, UV: Fix APICID shift for Westmere processors x86: Use PCI method for enabling AMD extended config space before MSR method x86: tsc: Prevent delayed init if initial tsc calibration failed x86, lapic-timer: Increase the max_delta to 31 bits x86: Fix sparse non-ANSI function warnings in smpboot.c x86, numa: Fix CONFIG_DEBUG_PER_CPU_MAPS without NUMA emulation x86, AMD, PCI: Add AMD northbridge PCI device id for CPU families 12h and 14h x86, numa: Fix cpu to node mapping for sparse node ids x86, numa: Fake node-to-cpumask for NUMA emulation x86, numa: Fake apicid and pxm mappings for NUMA emulation x86, numa: Avoid compiling NUMA emulation functions without CONFIG_NUMA_EMU x86, numa: Reduce minimum fake node size to 32M Fix up trivial conflict in arch/x86/kernel/apic/x2apic_uv_x.c
2011-01-11x86: tsc: Prevent delayed init if initial tsc calibration failedThomas Gleixner
commit a8760ec (x86: Check tsc available/disabled in the delayed init function) missed to prevent the setup of the delayed init function in case the initial tsc calibration failed. This results in the same divide by zero bug as we have seen without the tsc disabled check. Skip the delayed work setup when tsc_khz (the initial calibration value) is 0. Bisected-and-tested-by: Kirill A. Shutemov <kas@openvz.org> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-01-07Merge branch 'for-2.6.38' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits) gameport: use this_cpu_read instead of lookup x86: udelay: Use this_cpu_read to avoid address calculation x86: Use this_cpu_inc_return for nmi counter x86: Replace uses of current_cpu_data with this_cpu ops x86: Use this_cpu_ops to optimize code vmstat: User per cpu atomics to avoid interrupt disable / enable irq_work: Use per cpu atomics instead of regular atomics cpuops: Use cmpxchg for xchg to avoid lock semantics x86: this_cpu_cmpxchg and this_cpu_xchg operations percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support percpu,x86: relocate this_cpu_add_return() and friends connector: Use this_cpu operations xen: Use this_cpu_inc_return taskstats: Use this_cpu_ops random: Use this_cpu_inc_return fs: Use this_cpu_inc_return in buffer.c highmem: Use this_cpu_xx_return() operations vmstat: Use this_cpu_inc_return for vm statistics x86: Support for this_cpu_add, sub, dec, inc_return percpu: Generic support for this_cpu_add, sub, dec, inc_return ... Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c} as per Tejun.
2010-12-30x86: Use this_cpu_ops to optimize codeTejun Heo
Go through x86 code and replace __get_cpu_var and get_cpu_var instances that refer to a scalar and are not used for address determinations. Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2010-12-13x86: Check tsc available/disabled in the delayed init functionThomas Gleixner
The delayed TSC init function does not check whether the system has no TSC or TSC is disabled at the kernel command line, which results in a crash in the work queue based extended calibration due to division by zero because the basic calibration never happened. Add the missing checks and do not touch TSC when not available or disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com>
2010-12-02x86: Improve TSC calibration using a delayed workqueueJohn Stultz
Boot to boot the TSC calibration may vary by quite a large amount. While normal variance of 50-100ppm can easily be seen, the quick calibration code only requires 500ppm accuracy, which is the limit of what NTP can correct for. This can cause problems for systems being used as NTP servers, as every time they reboot it can take hours for them to calculate the new drift error caused by the calibration. The classic trade-off here is calibration accuracy vs slow boot times, as during the calibration nothing else can run. This patch uses a delayed workqueue to calibrate the TSC over the period of a second. This allows very accurate calibration (in my tests only varying by 1khz or 0.4ppm boot to boot). Additionally this refined calibration step does not block the boot process, and only delays the TSC clocksoure registration by a few seconds in early boot. If the refined calibration strays 1% from the early boot calibration value, the system will fall back to already calculated early boot calibration. Credit to Andi Kleen who suggested using a timer quite awhile back, but I dismissed it thinking the timer calibration would be done after the clocksource was registered (which would break things). Forgive me for my short-sightedness. This patch has worked very well in my testing, but TSC hardware is quite varied so it would probably be good to get some extended testing, possibly pushing inclusion out to 2.6.39. Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1289003985-29060-1-git-send-email-johnstul@us.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@elte.hu> CC: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: Clark Williams <williams@redhat.com> CC: Andi Kleen <andi@firstfloor.org>
2010-12-02Merge remote branch 'tip/x86/tsc' into fortglx/2.6.38/tip/x86/tscJohn Stultz
Conflicts: Documentation/kernel-parameters.txt
2010-10-21Merge branch 'x86-amd-nb-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, amd_nb: Enable GART support for AMD family 0x15 CPUs x86, amd: Use compute unit information to determine thread siblings x86, amd: Extract compute unit information for AMD CPUs x86, amd: Add support for CPUID topology extension of AMD CPUs x86, nmi: Support NMI watchdog on newer AMD CPU families x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB x86, k8-gart: Decouple handling of garts and northbridges x86, cacheinfo: Fix dependency of AMD L3 CID x86, kvm: add new AMD SVM feature bits x86, cpu: Fix allowed CPUID bits for KVM guests x86, cpu: Update AMD CPUID feature bits x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit x86, AMD: Remove needless CPU family check (for L3 cache info) x86, tsc: Remove CPU frequency calibration on AMD
2010-10-18x86: Add IRQ_TIME_ACCOUNTINGVenkatesh Pallipadi
This patch adds IRQ_TIME_ACCOUNTING option on x86 and runtime enables it when TSC is enabled. This change just enables fine grained irq time accounting, isn't used yet. Following patches use it for different purposes. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-6-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-01Merge remote branch 'origin/x86/cpu' into x86/amd-nbH. Peter Anvin
2010-09-10x86, tsc: Fix a preemption leak in restore_sched_clock_state()Peter Zijlstra
A real life genuine preemption leak.. Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-25x86, tsc: Remove CPU frequency calibration on AMDBorislav Petkov
6b37f5a20c0e5c334c010a587058354215433e92 introduced the CPU frequency calibration code for AMD CPUs whose TSCs didn't increment with the core's P0 frequency. From F10h, revB onward, however, the TSC increment rate is denoted by MSRC001_0015[24] and when this bit is set (which should be done by the BIOS) the TSC increments with the P0 frequency so the calibration is not needed and booting can be a couple of mcecs faster on those machines. Besides, there should be virtually no machines out there which don't have this bit set, therefore this calibration can be safely removed. It is a shaky hack anyway since it assumes implicitly that the core is in P0 when BIOS hands off to the OS, which might not always be the case. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100825162823.GE26438@aftab> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-20x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep statesSuresh Siddha
TSC's get reset after suspend/resume (even on cpu's with invariant TSC which runs at a constant rate across ACPI P-, C- and T-states). And in some systems BIOS seem to reinit TSC to arbitrary large value (still sync'd across cpu's) during resume. This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less than rq->age_stamp (introduced in 2.6.32). This leads to a big value returned by scale_rt_power() and the resulting big group power set by the update_group_power() is causing improper load balancing between busy and idle cpu's after suspend/resume. This resulted in multi-threaded workloads (like kernel-compilation) go slower after suspend/resume cycle on core i5 laptops. Fix this by recomputing cyc2ns_offset's during resume, so that sched_clock() continues from the point where it was left off during suspend. Reported-by: Florian Pritz <flo@xssn.at> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: <stable@kernel.org> # [v2.6.32+] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1282262618.2675.24.camel@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-07-27x86: Convert common clocksources to use clocksource_register_hz/khzJohn Stultz
This converts the most common of the x86 clocksources over to use clocksource_register_hz/khz. Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1279068988-21864-11-git-send-email-johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-03-08Merge branch 'for-next' into for-linusJiri Kosina
Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c
2010-03-01Merge branch 'timers-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: posix-timers.c: Don't export local functions clocksource: start CMT at clocksource resume clocksource: add suspend callback clocksource: add argument to resume callback ntp: Cleanup xtime references in ntp.c ntp: Make time_esterror and time_maxerror static
2010-02-09tree-wide: Assorted spelling fixesDaniel Mack
In particular, several occurances of funny versions of 'success', 'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address', 'beginning', 'desirable', 'separate' and 'necessary' are fixed. Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Joe Perches <joe@perches.com> Cc: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-05clocksource: add argument to resume callbackMagnus Damm
Pass the clocksource as an argument to the clocksource resume callback. Needed so we can point out which CMT channel the sh_cmt.c driver shall resume. Signed-off-by: Magnus Damm <damm@opensource.se> Cc: john stultz <johnstul@us.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-01-18x86, trivial: Fix grammo in tsc comment about Geode TSC reliabilityThadeu Lima de Souza Cascardo
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Cc: marcelo@kvack.org Cc: dilinger@collabora.co.uk Cc: trivial@kernel.org LKML-Reference: <1263764685-9871-1-git-send-email-cascardo@holoscopio.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-17x86: Reenable TSC sync check at boot, even with NONSTOP_TSCPallipadi, Venkatesh
Commit 83ce4009 did the following change If the TSC is constant and non-stop, also set it reliable. But, there seems to be few systems that will end up with TSC warp across sockets, depending on how the cpus come out of reset. Skipping TSC sync test on such systems may result in time inconsistency later. So, reenable TSC sync test even on constant and non-stop TSC systems. Set, sched_clock_stable to 1 by default and reset it in mark_tsc_unstable, if TSC sync fails. This change still gives perf benefit mentioned in 83ce4009 for systems where TSC is reliable. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091217202702.GA18015@linux-os.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-09-20Merge branch 'linus' into x86/urgentIngo Molnar
Merge reason: Bring in changes that the next patch will depend on. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-20x86: Trivial whitespace cleanupsFelipe Contreras
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Alok N Kataria <akataria@vmware.com> Cc: "Tan Wei Chong" <wei.chong.tan@intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Bob Moore <robert.moore@intel.com> LKML-Reference: <1253137123-18047-2-git-send-email-felipe.contreras@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-18Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (38 commits) x86: Move get/set_wallclock to x86_platform_ops x86: platform: Fix section annotations x86: apic namespace cleanup x86: Distangle ioapic and i8259 x86: Add Moorestown early detection x86: Add hardware_subarch ID for Moorestown x86: Add early platform detection x86: Move tsc_init to late_time_init x86: Move tsc_calibration to x86_init_ops x86: Replace the now identical time_32/64.c by time.c x86: time_32/64.c unify profile_pc x86: Move calibrate_cpu to tsc.c x86: Make timer setup and global variables the same in time_32/64.c x86: Remove mca bus ifdef from timer interrupt x86: Simplify timer_ack magic in time_32.c x86: Prepare unification of time_32/64.c x86: Remove do_timer hook x86: Add timer_init to x86_init_ops x86: Move percpu clockevents setup to x86_init_ops x86: Move xen_post_allocator_init into xen_pagetable_setup_done ... Fix up conflicts in arch/x86/include/asm/io_apic.h
2009-08-31x86: Move tsc_calibration to x86_init_opsThomas Gleixner
TSC calibration is modified by the vmware hypervisor and paravirt by separate means. Moorestown wants to add its own calibration routine as well. So make calibrate_tsc a proper x86_init_ops function and override it by paravirt or by the early setup of the vmware hypervisor. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31x86: Move calibrate_cpu to tsc.cThomas Gleixner
Move the code where it's only user is. Also we need to look whether this hardwired hackery might interfere with perfcounters. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-31x86: Add timer_init to x86_init_opsThomas Gleixner
The timer init code is convoluted with several quirks and the paravirt timer chooser. Figuring out which code path is actually taken is not for the faint hearted. Move the numaq TSC quirk to tsc_pre_init x86_init_ops function and replace the paravirt time chooser and the remaining x86 quirk with a simple x86_init_ops function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-28x86: Make tsc=reliable override boot time stability checksjohn stultz
This patch makes the tsc=reliable option disable the boot time stability checks. Currently the option only disables the runtime watchdog checks. This change allows folks who want to override the boot time TSC stability checks and use the TSC when the system would otherwise disqualify it. There still are some situations that the TSC will be disqualified, such as cpufreq scaling. But these are situations where the box will hang if allowed. Patch also includes a fix for an issue found by Thomas Gleixner, where the TSC disqualification message wouldn't be printed after a call to unsynchronized_tsc(). Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: akataria@vmware.com Cc: Stephen Hemminger <shemminger@vyatta.com> LKML-Reference: <1250552447.7212.92.camel@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-28clocksource: Resolve cpu hotplug dead lock with TSC unstableThomas Gleixner
Martin Schwidefsky analyzed it: To register a clocksource the clocksource_mutex is acquired and if necessary timekeeping_notify is called to install the clocksource as the timekeeper clock. timekeeping_notify uses stop_machine which needs to take cpu_add_remove_lock mutex. Starting a new cpu is done with the cpu_add_remove_lock mutex held. native_cpu_up checks the tsc of the new cpu and if the tsc is no good clocksource_change_rating is called. Which needs the clocksource_mutex and the deadlock is complete. The solution is to replace the TSC via the clocksource watchdog mechanism. Mark the TSC as unstable and schedule the watchdog work so it gets removed in the watchdog thread context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com>
2009-08-15timekeeping: Move reset of cycle_last for tsc clocksource to tscMartin Schwidefsky
change_clocksource resets the cycle_last value to zero then sets it to a value read from the clocksource. The reset to zero is required only for the TSC clocksource to make the read_tsc function work after a resume. The reason is that the TSC read function uses cycle_last to detect backwards going TSCs. In the resume case cycle_last contains the TSC value from the last update before the suspend. On resume the TSC starts counting from 0 again and would trip over the cycle_last comparison. This is subtle and surprising. Move the reset to a resume function in the tsc code. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Daniel Walker <dwalker@fifo99.com> LKML-Reference: <20090814134808.142191175@de.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-10x86: Fix serialization in pit_expect_msb()Linus Torvalds
Wei Chong Tan reported a fast-PIT-calibration corner-case: | pit_expect_msb() is vulnerable to SMI disturbance corner case | in some platforms which causes /proc/cpuinfo to show wrong | CPU MHz value when quick_pit_calibrate() jumps to success | section. I think that the real issue isn't even an SMI - but the fact that in the very last iteration of the loop, there's no serializing instruction _after_ the last 'rdtsc'. So even in the absense of SMI's, we do have a situation where the cycle counter was read without proper serialization. The last check should be done outside the outer loop, since _inside_ the outer loop, we'll be testing that the PIT has the right MSB value has the right value in the next iteration. So only the _last_ iteration is special, because that's the one that will not check the PIT MSB value any more, and because the final 'get_cycles()' isn't serialized. In other words: - I'd like to move the PIT MSB check to after the last iteration, rather than in every iteration - I think we should comment on the fact that it's also a serializing instruction and so 'fences in' the TSC read. Here's a suggested replacement. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: "Tan, Wei Chong" <wei.chong.tan@intel.com> Tested-by: "Tan, Wei Chong" <wei.chong.tan@intel.com> LKML-Reference: <B28277FD4E0F9247A3D55704C440A140D5D683F3@pgsmsx504.gar.corp.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-20Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix out of scope variable access in sched_slice() sched: Hide runqueues from direct refer at source code level sched: Remove unneeded __ref tag sched, x86: Fix cpufreq + sched_clock() TSC scaling
2009-06-17Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] cpumask: new cpumask operators for arch/x86/kernel/cpu/cpufreq/powernow-k8.c [CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c [CPUFREQ] cpumask: avoid cpumask games in arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c [CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c [CPUFREQ] powernow-k8: get drv data for correct CPU [CPUFREQ] powernow-k8: read P-state from HW [CPUFREQ] reduce scope of ACPI_PSS_BIOS_BUG_MSG[] [CPUFREQ] Clean up convoluted code in arch/x86/kernel/tsc.c:time_cpufreq_notifier() [CPUFREQ] minor correction to cpu-freq documentation [CPUFREQ] powernow-k8.c: mess cleanup [CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is useful [CPUFREQ] powernow-k8: Set transition latency to 1 if ACPI tables export 0 [CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case
2009-06-17sched, x86: Fix cpufreq + sched_clock() TSC scalingPeter Zijlstra
For freqency dependent TSCs we only scale the cycles, we do not account for the discrepancy in absolute value. Our current formula is: time = cycles * mult (where mult is a function of the cpu-speed on variable tsc machines) Suppose our current cycle count is 10, and we have a multiplier of 5, then our time value would end up being 50. Now cpufreq comes along and changes the multiplier to say 3 or 7, which would result in our time being resp. 30 or 70. That means that we can observe random jumps in the time value due to frequency changes in both fwd and bwd direction. So what this patch does is change the formula to: time = cycles * frequency + offset And we calculate offset so that time_before == time_after, thereby ridding us of these jumps in time. [ Impact: fix/reduce sched_clock() jumps across frequency changing events ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu> Chucked-on-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2009-06-16time: move PIT_TICK_RATE to linux/timex.hArnd Bergmann
PIT_TICK_RATE is currently defined in four architectures, but in three different places. While linux/timex.h is not the perfect place for it, it is still a reasonable replacement for those drivers that traditionally use asm/timex.h to get CLOCK_TICK_RATE and expect it to be the PIT frequency. Note that for Alpha, the actual value changed from 1193182UL to 1193180UL. This is unlikely to make a difference, and probably can only improve accuracy. There was a discussion on the correct value of CLOCK_TICK_RATE a few years ago, after which every existing instance was getting changed to 1193182. According to the specification, it should be 1193181.818181... Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-15[CPUFREQ] Clean up convoluted code in ↵Dave Jones
arch/x86/kernel/tsc.c:time_cpufreq_notifier() Christoph Hellwig noticed the following potential uninitialised use: > arch/x86/kernel/tsc.c: In function 'time_cpufreq_notifier': > arch/x86/kernel/tsc.c:634: warning: 'dummy' may be used uninitialized in this function > > where we do have CONFIG_SMP set, freq->flags & CPUFREQ_CONST_LOOPS is > true and ref_freq is false. It seems plausable, though the circumstances for hitting it are really low. Nearly all SMP capable cpufreq drivers set CPUFREQ_CONST_LOOPS. powernow-k8 is really the only exception. The older CPUs were typically only ever UP. (powernow-k7 never supported SMP for eg) It's worth fixing regardless, as it cleans up the code. Fix possible uninitialized use of dummy, by just removing it, and making the setting of lpj more obvious. Signed-off-by: Dave Jones <davej@redhat.com>
2009-06-10Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: move rdtsc_barrier() into the TSC vread method
2009-06-10Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, nmi: Use predefined numbers instead of hardcoded one x86: asm/processor.h: remove double declaration x86, mtrr: replace MTRRdefType_MSR with msr-index's MSR_MTRRdefType x86, mtrr: replace MTRRfix4K_C0000_MSR with msr-index's MSR_MTRRfix4K_C0000 x86, mtrr: remove mtrr MSRs double declaration x86, mtrr: replace MTRRfix16K_80000_MSR with msr-index's MSR_MTRRfix16K_80000 x86, mtrr: replace MTRRfix64K_00000_MSR with msr-index's MSR_MTRRfix64K_00000 x86, mtrr: replace MTRRcap_MSR with msr-index's MSR_MTRRcap x86: mce: remove duplicated #include x86: msr-index.h remove duplicate MSR C001_0015 declaration x86: clean up arch/x86/kernel/tsc_sync.c a bit x86: use symbolic name for VM86_SIGNAL when used as vm86 default return x86: added 'ifndef _ASM_X86_IOMAP_H' to iomap.h x86: avoid multiple declaration of kstack_depth_to_print x86: vdso/vma.c declare vdso_enabled and arch_setup_additional_pages before they get used x86: clean up declarations and variables x86: apic/x2apic_cluster.c x86_cpu_to_logical_apicid should be static x86 early quirks: eliminate unused function