aboutsummaryrefslogtreecommitdiff
path: root/arch/x86_64/kernel
AgeCommit message (Collapse)Author
2007-01-09x86_64: Don't leak NT bit into next task (CVE-2006-5755)Andi Kleen
SYSENTER can cause a NT to be set which might cause crashes on the IRET in the next task. Following similar i386 patch from Linus. Backport to 2.6.16 by Chuck Ebbert <76306.1226@compuserve.com> [Changed 'set_debugreg' to the older 'set_debug' in setup64.c and added raw_local_save_flags() from 2.6.19 to system.h] Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-17x86-64: Mark rdtsc as sync only for netburst, not for core2Arjan van de Ven
On the Core2 cpus, the rdtsc instruction is not serializing (as defined in the architecture reference since rdtsc exists) and due to the deep speculation of these cores, it's possible that you can observe time go backwards between cores due to this speculation. Since the kernel already deals with this with the SYNC_RDTSC flag, the solution is simple, only assume that the instruction is serializing on family 15... The price one pays for this is a slightly slower gettimeofday (by a dozen or two cycles), but that increase is quite small to pay for a really-going-forward tsc counter. Backport by Chris Wright. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-11-05ACPI: enable SMP C-states on x86_64Shaohua Li
http://bugzilla.kernel.org/show_bug.cgi?id=5653 Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-05[PATCH] x86_64: Don't do syscall exit tracing twiceAndi Kleen
This fixes a regression from the earlier DOS fix for non canonical IRET addresses. It broke UML. int_ret_from_syscall already does syscall exit tracing, so no need to do it again in the caller. This caused problems for UML and some other special programs doing syscall interception. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-06-05[PATCH] x86_64: x86_64 add crashdump trigger pointsVivek Goyal
o Start booting into the capture kernel after an Oops if system is in a unrecoverable state. System will boot into the capture kernel, if one is pre-loaded by the user, and capture the kernel core dump. o One of the following conditions should be true to trigger the booting of capture kernel. - panic_on_oops is set. - pid of current thread is 0 - pid of current thread is 1 - Oops happened inside interrupt context. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2006-05-01[PATCH] x86_64: Fix a race in the free_iommu path.Mike Waychison
We do this by removing a micro-optimization that tries to avoid grabbing the iommu_bitmap_lock spinlock and using a bus-locked operation. This still races with other simultaneous alloc_iommu or free_iommu(size > 1) which both use bus-unlocked operations. The end result of this race is eventually ending up with an iommu_gart_bitmap that has bits errornously set all over, making large contiguous iommu space allocations fail with 'PCI-DMA: Out of IOMMU space'. Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-18[PATCH] i386/x86-64: Fix x87 information leak between processes (CVE-2006-1056)Andi Kleen
AMD K7/K8 CPUs only save/restore the FOP/FIP/FDP x87 registers in FXSAVE when an exception is pending. This means the value leak through context switches and allow processes to observe some x87 instruction state of other processes. This was actually documented by AMD, but nobody recognized it as being different from Intel before. The fix first adds an optimization: instead of unconditionally calling FNCLEX after each FXSAVE test if ES is pending and skip it when not needed. Then do a x87 load from a kernel variable to clear FOP/FIP/FDP. This means other processes always will only see a constant value defined by the kernel in their FP state. I took some pain to make sure to chose a variable that's already in L1 during context switch to make the overhead of this low. Also alternative() is used to patch away the new code on CPUs who don't need it. Patch for both i386/x86-64. The problem was discovered originally by Jan Beulich. Richard Brunner provided the basic code for the workarounds, with contribution from Jan. This is CVE-2006-1056 Cc: richard.brunner@amd.com Cc: jbeulich@novell.com Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-12[PATCH] x86_64: When user could have changed RIP always force IRET ↵Andi Kleen
(CVE-2006-0744) Intel EM64T CPUs handle uncanonical return addresses differently from AMD CPUs. The exception is reported in the SYSRET, not the next instruction. Thgis leads to the kernel exception handler running on the user stack with the wrong GS because the kernel didn't expect exceptions on this instruction. This version of the patch has the teething problems that plagued an earlier version fixed. This is CVE-2006-0744 Thanks to Ernie Petrides and Asit B. Mallick for analysis and initial patches. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-04-12[PATCH] x86_64: Clean up execveAndi Kleen
Just call IRET always, no need for any special cases. Needed for the next bug fix. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-14Revert "[PATCH] x86-64: Fix up handling of non canonical user RIPs"Linus Torvalds
This reverts commit c33d4568aca9028a22857f94f5e0850012b6444b. Andrew Clayton and Hugh Dickins report that it's broken for them and causes strange page table and slab corruption, and spontaneous reboots. Let's get it right next time. Cc: Andrew Clayton <andrew@rootshell.co.uk> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-12[PATCH] x86-64: Fix up handling of non canonical user RIPsAndi Kleen
EM64T CPUs have somewhat weird error reporting for non canonical RIPs in SYSRET. We can't handle any exceptions there because the exception handler would end up running on the user stack which is unsafe. To avoid problems any code that might end up with a user touched pt_regs should return using int_ret_from_syscall. int_ret_from_syscall ends up using IRET, which allows safe exceptions. Cc: Ernie Petrides <petrides@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-08[PATCH] fix kexec asmMichael Matz
While testing kexec and kdump we hit problems where the new kernel would freeze or instantly reboot. The easiest way to trigger it was to kexec a kernel compiled for CONFIG_M586 on an athlon cpu. Compiling for CONFIG_MK7 instead would work fine. The patch fixes a few problems with the kexec inline asm. Signed-off-by: Chris Mason <mason@suse.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-27Revert "[PATCH] x86_64: Only do the clustered systems have unsynchronized ↵Linus Torvalds
TSC assumption on IBM systems" This reverts commit 13a229abc25640813f1480c0478dfc6bdbc1c19e. Quoth Andi: "After some consideration and feedback from various people it turns out this wasn't that good an idea. It has some problems and needs more work. Since it was only an optimization anyways it's best to just back it out again for now." Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26[PATCH] fix build on x86_64 with !CONFIG_HOTPLUG_CPUBrian Magnuson
The commit e2c0388866dc12bef56b178b958f9b778fe6c687 added setup_additional_cpus to setup.c but this is only defined if CONFIG_HOTPLUG_CPU is set. This patch changes the #ifdef to reflect that. Signed-off-by: Brian Magnuson <magnuson@rcn.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26[PATCH] x86_64: Better ATI timer fixAndi Kleen
The previous experiment for using apicmaintimer on ATI systems didn't work out very well. In particular laptops with C2/C3 support often don't let it tick during idle, which makes it useless. There were also some other bugs that made the apicmaintimer often not used at all. I tried some other experiments - running timer over RTC and some other things but they didn't really work well neither. I rechecked the specs now and it turns out this simple change is actually enough to avoid the double ticks on the ATI systems. We just turn off IRQ 0 in the 8254 and only route it directly using the IO-APIC. I tested it on a few ATI systems and it worked there. In fact it worked on all chipsets (NVidia, Intel, AMD, ATI) I tried it on. According to the ACPI spec routing should always work through the IO-APIC so I think it's the correct thing to do anyways (and most of the old gunk in check_timer should be thrown away for x86-64). But for 2.6.16 it's best to do a fairly minimal change: - Use the known to be working everywhere-but-ATI IRQ0 both over 8254 and IO-APIC setup everywhere - Except on ATI disable IRQ0 in the 8254 - Remove the code to select apicmaintimer on ATI chipsets - Add some boot options to allow to override this (just paranoia) In 2.6.17 I hope to switch the default over to this for everybody. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26[PATCH] x86_64: Move the SMP time selection earlierAndi Kleen
SMP time selection originally ran after all CPUs were brought up because it needed to know the number of CPUs to decide if it needs an MP safe timer or not. This is not needed anymore because we know present CPUs early. This fixes a couple of problems: - apicmaintimer didn't always work because it relied on state that was set up time_init_gtod too late. - The output for the used timer in early kernel log was misleading because time_init_gtod could actually change it later. Now always print the final timer choice Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26[PATCH] x86_64: Fix the additional_cpus=.. optionAndi Kleen
It didn't set up the CPU possible map early enough, so the option didn't actually work. Noticed by Heiko Carstens Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26[PATCH] x86_64: Fix NMI watchdog on x460Chris McDermott
[description from AK] Old check for the IO-APIC watchdog during the timer check was wrong - it obviously should only drop into this if the IO-APIC watchdog is used. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26[PATCH] x86_64: Only do the clustered systems have unsynchronized TSC ↵Andi Kleen
assumption on IBM systems Big Unisys systems have multiple clusters too, but they have an synchronized TSC. I'm using the SMBIOS to check for vendor == IBM. Cc: Chris McDermott <lcm@us.ibm.com> Cc: "Protasevich, Natalie" <Natalie.Protasevich@unisys.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26[PATCH] x86_64: no_iommu removal in pci-gart.cJon Mason
In previous versions of pci-gart.c, no_iommu was used to determine if IOMMU was disabled in the GART DMA mapping functions. This changed in 2.6.16 and now gart_xxx() functions are only called if gart is enabled. Therefore, uses of no_iommu in the GART code are no longer necessary and can be removed. Also, it removes double deceleration of no_iommu and force_iommu in pci.h and proto.h, by removing the deceleration in pci.h. Lastly, end_pfn off by one error. Tested (along with patch 1/2) on dual opteron with gart enabled, iommu=soft, and iommu=off. Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-25[PATCH] x86-64: react to new topology.c locationDave Jones
Commit 9c869edac591977314323a4eaad5f7633fca684f moved the i386 topology.c file. That change broke x86-64 compiles, as it uses the same file. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17[PATCH] Remove KERN_INFO from middle of printk lineTim Hockin
Don't print KERN_INFO in the middle of a printk line. printk(KERN_INFO "OEM ID: %s ",str); is just above this. This is already fixed up in i386 copy. Signed-off-by: Martin J. Bligh <mbligh@google.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17[PATCH] x86_64: Resolve the RIP of an early exception using kallsymsAndi Kleen
But do it after everything else to risk less from recursive crashes. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17[PATCH] x86_64: Disable tsc when apicpmtimer is activeAndi Kleen
Otherwise it has no effect anyways. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17[PATCH] x86_64: Don't enable ATI apicmaintimer workaround when the machine ↵Andi Kleen
has C2 or C3 Many laptops have problems with ticking the local APIC timer in C2/C3. The code added earlier to use it by default on ATI didn't really work for them. Don't enable it when the system supports C2/C3. This doesn't fix the problem fully, but at least it's not worse than before. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17[PATCH] x86_64: Don't call do_exit with interrupts disabled after IRET exceptionAndi Kleen
This caused a sigreturn with bad argument on a preemptible kernel to complain with Debug: sleeping function called from invalid context at /home/lsrc/quilt/linux/include/linux/rwsem.h:43 in_atomic():0, irqs_disabled():1 Call Trace: {__might_sleep+190} {profile_task_exit+21} {__do_exit+34} {do_wait+0} Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17[PATCH] x86_64: make touch_nmi_watchdog() not touch impossible cpus' private ↵Jan Beulich
data Along with that, also suppress the memory touching altogether when the watchdog is not running, to eliminate needless crosstalk. Plus ad a call to it to make things consistent (one could also consider removing the call in enable_timer_nmi_watchdog()). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-12[PATCH] x86_64: GART DMA merging fixAndi Kleen
Don't touch the non DMA members in the sg list in dma_map_sg in the IOMMU Some drivers (in particular ST) ran into problems because they reused the sg lists after passing them to pci_map_sg(). The merging procedure in the K8 GART IOMMU corrupted the state. This patch changes it to only touch the dma* entries during merging, but not the other fields. Approach suggested by Dave Miller. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-12[PATCH] arch/x86_64/kernel/traps.c PTRACE_SINGLESTEP oopsJohn Blackwood
We found a problem with x86_64 kernels with preemption enabled, where having multiple tasks doing ptrace singlesteps around the same time will cause the system to 'oops'. The problem seems that a task can get preempted out of the do_debug() processing while it is running on the DEBUG_STACK stack. If another task on that same cpu then enters do_debug() and uses the same per-cpu DEBUG_STACK stack, the previous preempted tasks's stack contents can be corrupted, and the system will oops when the preempted task is context switched back in again. The typical oops looks like the following: Unable to handle kernel paging request at ffffffffffffffae RIP: <ffffffff805452a1>{thread_return+34} PGD 103027 PUD 102429067 PMD 0 Oops: 0002 [1] PREEMPT SMP CPU 0 Modules linked in: Pid: 3786, comm: ssdd Not tainted 2.6.15.2 #1 RIP: 0010:[<ffffffff805452a1>] <ffffffff805452a1>{thread_return+34} RSP: 0018:ffffffff80824058 EFLAGS: 000136c2 RAX: ffff81017e12cea0 RBX: 0000000000000000 RCX: 00000000c0000100 RDX: 0000000000000000 RSI: ffff8100f7856e20 RDI: ffff81017e12cea0 RBP: 0000000000000046 R08: ffff8100f68a6000 R09: 0000000000000000 R10: 0000000000000000 R11: ffff81017e12cea0 R12: ffff81000c2d53e8 R13: ffff81017f5b3be8 R14: ffff81000c0036e0 R15: 000001056cbfc899 FS: 00002aaaaaad9b00(0000) GS:ffffffff80883800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffffffffae CR3: 00000000f6fcf000 CR4: 00000000000006e0 Process ssdd (pid: 3786, threadinfo ffff8100f68a6000, task ffff8100f7856e20) Stack: ffffffff808240d8 ffffffff8012a84a ffff8100055f6c00 0000000000000020 0000000000000001 ffff81000c0036e0 ffffffff808240b8 0000000000000000 0000000000000000 0000000000000000 Call Trace: <#DB> <ffffffff8012a84a>{try_to_wake_up+985} <ffffffff8012c0d3>{kick_process+87} <ffffffff8013b262>{signal_wake_up+48} <ffffffff8013b5ce>{specific_send_sig_info+179} <ffffffff80546abc>{_spin_unlock_irqrestore+27} <ffffffff8013b67c>{force_sig_info+159} <ffffffff801103a0>{do_debug+289} <ffffffff80110278>{sync_regs+103} <ffffffff8010ed9a>{paranoid_userspace+35} Unable to handle kernel paging request at 00007fffffb7d000 RIP: <ffffffff8010f2e4>{show_trace+465} PGD f6f25067 PUD f6fcc067 PMD f6957067 PTE 0 Oops: 0000 [2] PREEMPT SMP This patch disables preemptions for the task upon entry to do_debug(), before interrupts are reenabled, and then disables preemption before exiting do_debug(), after disabling interrupts. I've noticed that the task can be preempted either at the end of an interrupt, or on the call to force_sig_info() on the spin_unlock_irqrestore() processing. It might be better to attempt to code a fix in entry.S around the code that calls do_debug(). Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-11[PATCH] x86-64: Fix HPET timer on x460Chris McDermott
[description from AK] The IBM Summit 3 chipset doesn't implement the HPET timer replacement option. Since the current Linux code relies on it use a mixed mode with both PIT for the interrupt and HPET counters for the time keeping. That was already implemented, but didn't work properly because it was still using the last interrupt offset in HPET. This resulted in x460 not booting. Fix this up by using the free running HPET counter. Shouldn't affect any other machine because they either use full HPET mode or no HPET at all. TBD needs a similar 32bit fix. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Bob Picco <bob.picco@hp.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-07[PATCH] amd64 time.c __iomem annotationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-02-07[PATCH] drive_info removal outside of arch/i386Al Viro
drive_info is used only by hd.c and that happens under #ifdef __i386__. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-02-07[PATCH] x86_64: Fix the node cpumask of a cpu going downRavikiran G Thirumalai
Currently, x86_64 and ia64 arches do not clear the corresponding bits in the node's cpumask when a cpu goes down or cpu bring up is cancelled. This is buggy since there are pieces of common code where the cpumask is checked in the cpu down code path to decide on things (like in the slab down path). PPC does the right thing, but x86_64 and ia64 don't (This was the reason Sonny hit upon a slab bug during cpu offline on ppc and could not reproduce on other arches). This patch fixes it for x86_64. I won't attempt ia64 as I cannot test it. Credit for spotting this should go to Alok. (akpm: this was applied, then reverted. But it's OK now because we now use for_each_cpu() in the right places). Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05Revert "[PATCH] x86_64: Fix the node cpumask of a cpu going down"Linus Torvalds
This reverts commit 10f4dc8b27ac42f930ac55adb8c521264dc997f8. Quoth Andi Kleen: "Kiran decided that it makes the problem worse than it was before. Fixing it fully requires more work which is too much for 2.6.16. So please revert that commit for now." Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: IOMMU printk cleanupJon Mason
This patch contains a printk reorder to remove the current problem of displaying "PCI-DMA: Disabling IOMMU." and then "PCI-DMA: using GART IOMMU" 20 lines later in dmesg. It also constains a printk reorder in swiotlb to state swiotlb enablement prior to describing the location of the bounce buffers, and a printk reorder to state gart enablement prior to describing the aperature. Also constains a whitespace cleanup in arch/x86_64/kernel/setup.c Tested (along with patch 2/2) on dual opteron with gart enabled, iommu=soft, and iommu=off. Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: Let impossible CPUs point to reference per cpu dataAndi Kleen
Hack for 2.6.16. In 2.6.17 all code that uses NR_CPUs should be audited and changed to only touch possible CPUs. Don't mark the reference per cpu data init data (so it stays around after boot) and point all impossible CPUs to it. This way they reference some valid - although shared memory. Usually this is only initialization like INIT_LIST_HEADs and there won't be races because these CPUs never run. Still somewhat hackish. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] i386/x86-64: Don't ack the APIC for bad interrupts when the APIC is ↵Andi Kleen
not enabled It's bad juju to touch the APIC when it hasn't been enabled. I also moved ack_bad_irq for x86-64 out of line following i386. Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: small fix for CFI annotationsJan Beulich
Conditionalize two unwind directives to match other similarly conditional code. Signed-Off-By: Jan Beulich <jbeulich@novell.com> Cc: Jim Houston <jim.houston@ccur.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: Calibrate APIC timer using PM timerAndi Kleen
On some broken motherboards (at least one NForce3 based AMD64 laptop) the PIT timer runs at a incorrect frequency. This patch adds a new option "apicpmtimer" that allows to use the APIC timer and calibrate it using the PMTimer. It requires the earlier patch that allows to run the main timer from the APIC. Specifying apicpmtimer implies apicmaintimer. The option defaults to off for now. I tested it on a few systems and the resulting APIC timer frequencies were usually a bit off, but always <1%, which should be tolerable. TBD figure out heuristic to enable this automatically on the affected systems TBD perhaps do it on all NForce3s or using DMI? Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: Don't allow kprobes on __switch_toAndi Kleen
kprobes cannot deal with the funny calling conventions when it runs on a different stack when it returns. If someone wants to instrument context switch they can add a probe to schedule() instead. Cc: jkenisto@us.ibm.com, prasanna@in.ibm.com Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: align per-cpu section to configured cache bytesZach Brown
Align the start of the per-cpu section to the configured number of bytes in a cache line. This stops a BUG_ON() from triggering in load_module() when DEFINE_PER_CPU() is used in a module and the section isn't cacheline-aligned. Rusty also found this and sent a patch in a while ago (http://lkml.org/lkml/2004/10/19/17), I don't know what came of that. Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: When allocation of merged SG lists fails in the IOMMU don't ↵Kevin VanMaren
merge [ AK: I redid Kevin's fix to be simpler, but the idea and original analysis of the problem is from Kevin] This avoid allocation failures on some SATA systems like Nvidia CK8 when the IOMMU gets fragmented. Modern SATA devices have quite large queues (128 entries) and the FS with ext2/3 is good enough now that it often passes whole 128 page sg lists down to the driver. These require 512K of continuous free space in the IOMMU aperture to map when merged. When the IOMMU is fragmented this could lead to spurious IO errors due to failing mappings. Short term fix is to just try to map the SG list again unmerged page by page - this way fragmentation doesn't matter anymore. The code for that was already there, but it just wasn't enabled for the merge case. According to Kevin at least the Nvidia device doesn't seem to benefit from merging much anyways, so the only slowdown is from trying to do an unnecessary merge attempt. Kevin plans to implement better fragmentation avoidance in the future, but that wouldn't be 2.6.16 material. TBD: should add some statistic counters to count how often that really happens. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: data/functions wrongly marked as __init with cpu hotplug.Ashok Raj
attached patch is 2 more cases i found via running the reference_init.pl script. These were easy to spot just knowing the file names. There is one another about init/main.c that i cant exactly zero in. (partly because i dont know how to interpret the data thats spewed out of the tool). Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: mark two routines as __cpuinitShaohua Li
SIgned-off-by: Shaohua Li<shaohua.li@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: Fix the node cpumask of a cpu going downRavikiran G Thirumalai
Currently, x86_64 and ia64 arches do not clear the corresponding bits in the node's cpumask when a cpu goes down or cpu bring up is cancelled. This is buggy since there are pieces of common code where the cpumask is checked in the cpu down code path to decide on things (like in the slab down path). PPC does the right thing, but x86_64 and ia64 don't (This was the reason Sonny hit upon a slab bug during cpu offline on ppc and could not reproduce on other arches). This patch fixes it for x86_64. I won't attempt ia64 as I cannot test it. Credit for spotting this should go to Alok. Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: Undo the earlier changes to remove unrolled copy/memset ↵Andi Kleen
functions They cause quite bad performance regressions on Netburst This is temporary until we can get new optimized functions for these CPUs. This undoes changes that were done in 2.6.15 and in 2.6.16-rc1, essentially bringing the code back to 2.6.14 level. Only change is I renamed the X86_FEATURE_K8_C flag to X86_FEATURE_REP_GOOD and fixed the check for the flag and also fixed some comments. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: Fix swiotlb dma_alloc_coherent fallbackAndi Kleen
This avoids BUG_ONs in the low level allocator when an illegal GFP mask is added. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: [PATCH] timer resumeShaohua Li
At resume time, TSC's value or something similar might be changed a lot against suspend time. This could make system gets a very big lost ticks. See http://bugzilla.kernel.org/show_bug.cgi?id=5825 Signed-off-by: Shaohua Li<shaohua.li@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: Automatically enable apicmaintimer on ATI boardsAndi Kleen
They all have problems with IRQ 0 routing, so just use the APIC on them. Can be overwritten with "noapicmaintimer" Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04[PATCH] x86_64: Allow to run main time keeping from the local APIC interruptAndi Kleen
Another piece from the no-idle-tick patch. This can be enabled with the "apicmaintimer" option. This is mainly useful when the PIT/HPET interrupt is unreliable. Note there are some systems that are known to stop the APIC timer in C3. For those it will never work, but this case should be automatically detected. It also only works with PM timer right now. When HPET is used the way the main timer handler computes the delay doesn't work. It should be a bit more efficient because there is one less regular interrupt to process on the boot processor. Requires earlier bugfix from Venkatesh Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>