aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2010-03-15x86, mm: Allow highmem user page tables to be disabled at boot timeIan Campbell
commit 14315592009c17035cac81f4954d5a1f4d71e489 upstream. Distros generally (I looked at Debian, RHEL5 and SLES11) seem to enable CONFIG_HIGHPTE for any x86 configuration which has highmem enabled. This means that the overhead applies even to machines which have a fairly modest amount of high memory and which therefore do not really benefit from allocating PTEs in high memory but still pay the price of the additional mapping operations. Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but no actual highptes being allocated there was a reduction in system time used from 59.737s to 55.9s. With CONFIG_HIGHPTE=y and highmem PTEs being allocated: Average Optimal load -j 4 Run (std deviation): Elapsed Time 175.396 (0.238914) User Time 515.983 (5.85019) System Time 59.737 (1.26727) Percent CPU 263.8 (71.6796) Context Switches 39989.7 (4672.64) Sleeps 42617.7 (246.307) With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated: Average Optimal load -j 4 Run (std deviation): Elapsed Time 174.278 (0.831968) User Time 515.659 (6.07012) System Time 55.9 (1.07799) Percent CPU 263.8 (71.266) Context Switches 39929.6 (4485.13) Sleeps 42583.7 (373.039) This patch allows the user to control the allocation of PTEs in highmem from the command line ("userpte=nohigh") but retains the status-quo as the default. It is possible that some simple heuristic could be developed which allows auto-tuning of this option however I don't have a sufficiently large machine available to me to perform any particularly meaningful experiments. We could probably handwave up an argument for a threshold at 16G of total RAM. Assuming 768M of lowmem we have 196608 potential lowmem PTE pages. Each page can map 2M of RAM in a PAE-enabled configuration, meaning a maximum of 384G of RAM could potentially be mapped using lowmem PTEs. Even allowing generous factor of 10 to account for other required lowmem allocations, generous slop to account for page sharing (which reduces the total amount of RAM mappable by a given number of PT pages) and other innacuracies in the estimations it would seem that even a 32G machine would not have a particularly pressing need for highmem PTEs. I think 32G could be considered to be at the upper bound of what might be sensible on a 32 bit machine (although I think in practice 64G is still supported). It's seems questionable if HIGHPTE is even a win for any amount of RAM you would sensibly run a 32 bit kernel on rather than going 64 bit. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> LKML-Reference: <1266403090-20162-1-git-send-email-ian.campbell@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15KVM: x86 emulator: Check CPL level during privilege instruction emulationGleb Natapov
commit e92805ac1228626c59c865f2f4e9059b9fb8c97b upstream. Add CPL checking in case emulator is tricked into emulating privilege instruction from userspace. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15KVM: x86 emulator: Add group9 instruction decodingGleb Natapov
commit 60a29d4ea4e7b6b95d9391ebc8625b0426f3a363 upstream. Use groups mechanism to decode 0F C7 instructions. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15KVM: x86 emulator: Forbid modifying CS segment register by mov instructionGleb Natapov
commit 8b9f44140bc4afd2698413cd9960c3912168ee91 upstream. Inject #UD if guest attempts to do so. This is in accordance to Intel SDM. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15KVM: x86 emulator: Add group8 instruction decodingGleb Natapov
commit 2db2c2eb6226e30f8059b82512a1364db98da8e3 upstream. Use groups mechanism to decode 0F BA instructions. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15x86: Fix SCI on IOAPIC != 0Yinghai Lu
commit 18dce6ba5c8c6bd0f3ab4efa4cbdd698dab5c40a upstream. Thomas Renninger <trenn@suse.de> reported on IBM x3330 booting a latest kernel on this machine results in: PCI: PCI BIOS revision 2.10 entry at 0xfd61c, last bus=1 PCI: Using configuration type 1 for base access bio: create slab <bio-0> at 0 ACPI: SCI (IRQ30) allocation failed ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler (20090903/evevent-161) ACPI: Unable to start the ACPI Interpreter Later all kind of devices fail... and bisect it down to this commit: commit b9c61b70075c87a8612624736faf4a2de5b1ed30 x86/pci: update pirq_enable_irq() to setup io apic routing it turns out we need to set irq routing for the sci on ioapic1 early. -v2: make it work without sparseirq too. -v3: fix checkpatch.pl warning, and cc to stable Reported-by: Thomas Renninger <trenn@suse.de> Bisected-by: Thomas Renninger <trenn@suse.de> Tested-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-2-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15x86: Avoid race condition in pci_enable_msix()Brandon Phiilps
commit ced5b697a76d325e7a7ac7d382dbbb632c765093 upstream. Keep chip_data in create_irq_nr and destroy_irq. When two drivers are setting up MSI-X at the same time via pci_enable_msix() there is a race. See this dmesg excerpt: [ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X [ 85.170611] alloc irq_desc for 99 on node -1 [ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X [ 85.170614] alloc kstat_irqs on node -1 [ 85.170616] alloc irq_2_iommu on node -1 [ 85.170617] alloc irq_desc for 100 on node -1 [ 85.170619] alloc kstat_irqs on node -1 [ 85.170621] alloc irq_2_iommu on node -1 [ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X [ 85.170626] alloc irq_desc for 101 on node -1 [ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X [ 85.170630] alloc kstat_irqs on node -1 [ 85.170631] alloc irq_2_iommu on node -1 [ 85.170635] alloc irq_desc for 102 on node -1 [ 85.170636] alloc kstat_irqs on node -1 [ 85.170639] alloc irq_2_iommu on node -1 [ 85.170646] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 As you can see igb and ixgbe are both alternating on create_irq_nr() via pci_enable_msix() in their probe function. ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data = NULL via dynamic_irq_init(). igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[] via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this: cfg_new = irq_desc_ptrs[102]->chip_data; if (cfg_new->vector != 0) continue; This hits the NULL deref. Another possible race exists via pci_disable_msix() in a driver or in the number of error paths that call free_msi_irqs(): destroy_irq() dynamic_irq_cleanup() which sets desc->chip_data = NULL ...race window... desc->chip_data = cfg; Remove the save and restore code for cfg in create_irq_nr() and destroy_irq() and take the desc->lock when checking the irq_cfg. Reported-and-analyzed-by: Brandon Philips <bphilips@suse.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org> Signed-off-by: Brandon Phililps <bphilips@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15x86, xen: Disable highmem PTE allocation even when CONFIG_HIGHPTE=yIan Campbell
commit 817a824b75b1475f1b067c8cee318c7b4d66fcde upstream. There's a path in the pagefault code where the kernel deliberately breaks its own locking rules by kmapping a high pte page without holding the pagetable lock (in at least page_check_address). This breaks Xen's ability to track the pinned/unpinned state of the page. There does not appear to be a viable workaround for this behaviour so simply disable HIGHPTE for all Xen guests. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> LKML-Reference: <1267204562-11844-1-git-send-email-ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pasi Kärkkäinen <pasik@iki.fi> Cc: <xen-devel@lists.xensource.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15x86: Add iMac9,1 to pci_reboot_dmi_tableJustin P. Mattock
commit 0a832320f1bae6a4169bf683e201378f2437cfc1 upstream. On the iMac9,1 /sbin/reboot results in a black mangled screen. Adding this DMI entry gets the machine to reboot cleanly as it should. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> LKML-Reference: <1266362249-3337-1-git-send-email-justinmattock@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15x86, ia32_aout: do not kill argument mappingJiri Slaby
commit 318f6b228ba88a394ef560efc1bfe028ad5ae6b6 upstream. Do not set current->mm->mmap to NULL in 32-bit emulation on 64-bit load_aout_binary after flush_old_exec as it would destroy already set brpm mapping with arguments. Introduced by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba mm: variable length argument support where the argument mapping in bprm was added. [ hpa: this is a regression from 2.6.22... time to kill a.out? ] Signed-off-by: Jiri Slaby <jslaby@suse.cz> LKML-Reference: <1265831716-7668-1-git-send-email-jslaby@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ollie Wild <aaw@google.com> Cc: x86@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15oprofile/x86: fix msr access to reserved countersRobert Richter
commit cfc9c0b450176a077205ef39092f0dc1a04e020a upstream. During switching virtual counters there is access to perfctr msrs. If the counter is not available this fails due to an invalid address. This patch fixes this. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15oprofile/x86: use kzalloc() instead of kmalloc()Robert Richter
commit c17c8fbf349482e89b57d1b800e83e9f4cf40c47 upstream. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15oprofile/x86: remove node check in AMD IBS initializationRobert Richter
commit 89baaaa98a10cad5cc8516c7208b02d9fc711890 upstream. Standard AMD systems have the same number of nodes as there are northbridge devices. However, there may kernel configurations (especially for 32 bit) or system setups exist, where the node number is different or it can not be detected properly. Thus the check is not reliable and may fail though IBS setup was fine. For this reason it is better to remove the check. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15oprofile/x86: fix perfctr nmi reservation for mulitplexingRobert Richter
commit 68dc819ce829f7e7977a56524e710473bdb55115 upstream. Multiple virtual counters share one physical counter. The reservation of virtual counters fails due to duplicate allocation of the same counter. The counters are already reserved. Thus, virtual counter reservation may removed at all. This also makes the code easier. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-15ACPI: remove Asus P2B-DS from acpi=ht blacklistLen Brown
commit 97c169d39b6846a564dc8d883832e7fef9bdb77d upstream. We realized when we broke acpi=ht http://bugzilla.kernel.org/show_bug.cgi?id=14886 that acpi=ht is not needed on this box and folks have been using acpi=force on it anyway. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23x86: Re-get cfg_new in case reuse/move irq_descYinghai Lu
commit 37ef2a3029fde884808ff1b369677abc7dd9a79a upstream. When irq_desc is moved, we need to make sure to use the right cfg_new. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B07A739.3030104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23x86/amd-iommu: Fix deassignment of a device from the pt_domainJoerg Roedel
commit d3ad9373b7c29b63d5e8460a69453718d200cc3b upstream. Deassigning a device from the passthrough domain does not work and breaks device assignment to kvm guests. This patch fixes the issue. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23x86/amd-iommu: Fix IOMMU-API initialization for iommu=ptJoerg Roedel
commit f5325094379158e6b876ea0010c807bf7890ec8f upstream This patch moves the initialization of the iommu-api out of the dma-ops initialization code. This ensures that the iommu-api is initialized even with iommu=pt. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23KVM: PIT: control word is write-onlyMarcelo Tosatti
commit ee73f656a604d5aa9df86a97102e4e462dd79924 upstream. PIT control word (address 0x43) is write-only, reads are undefined. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23kvmclock: count total_sleep_time when updating guest clockJason Wang
commit 923de3cf5bf12049628019010e36623fca5ef6d1 upstream. Current kvm wallclock does not consider the total_sleep_time which could cause wrong wallclock in guest after host suspend/resume. This patch solve this issue by counting total_sleep_time to get the correct host boot time. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-23CPUFREQ: Fix use after free of struct powernow_k8_dataThomas Renninger
commit 557a701c16553b0b691dbb64ef30361115a80f64 upstream. Easy fix for a regression introduced in 2.6.31. On managed CPUs the cpufreq.c core will call driver->exit(cpu) on the managed cpus and powernow_k8 will free the core's data. Later driver->get(cpu) function might get called trying to read out the current freq of a managed cpu and the NULL pointer check does not work on the freed object -> better set it to NULL. ->get() is unsigned and must return 0 as invalid frequency. Reference: http://bugzilla.kernel.org/show_bug.cgi?id=14391 Signed-off-by: Thomas Renninger <trenn@suse.de> Tested-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09oprofile/x86: fix crash when profiling more than 28 eventsSuravee Suthikulpanit
commit d8cc108f4fab42b380c6b3f3356f99e8dd5372e2 upstream. With multiplexing enabled oprofile crashs when profiling more than 28 events. This patch fixes this. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09oprofile/x86: add Xeon 7500 series supportAndi Kleen
commit e83e452b0692c9c13372540deb88a77d4ae2553d upstream. Add Xeon 7500 series support to oprofile. Straight forward: it's the same as Core i7, so just detect the model number. No user space changes needed. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09KVM: allow userspace to adjust kvmclock offsetGlauber Costa
(cherry picked from afbcf7ab8d1bc8c2d04792f6d9e786e0adeb328d) When we migrate a kvm guest that uses pvclock between two hosts, we may suffer a large skew. This is because there can be significant differences between the monotonic clock of the hosts involved. When a new host with a much larger monotonic time starts running the guest, the view of time will be significantly impacted. Situation is much worse when we do the opposite, and migrate to a host with a smaller monotonic clock. This proposed ioctl will allow userspace to inform us what is the monotonic clock value in the source host, so we can keep the time skew short, and more importantly, never goes backwards. Userspace may also need to trigger the current data, since from the first migration onwards, it won't be reflected by a simple call to clock_gettime() anymore. [marcelo: future-proof abi with a flags field] [jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it] Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09x86/amd-iommu: Fix possible integer overflowJoerg Roedel
commit d91afd15b041f27d34859c79afa9e172018a86f4 upstream. The variable i in this function could be increased to over 2**32 which would result in an integer overflow when using int. Fix it by changing i to unsigned long. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09x86: Add quirk for Intel DG45FC board to avoid low memory corruptionDavid Härdeman
commit 7c099ce1575126395f186ecf58b51a60d5c3be7d upstream. Commit 6aa542a694dc9ea4344a8a590d2628c33d1b9431 added a quirk for the Intel DG45ID board due to low memory corruption. The Intel DG45FC shares the same BIOS (and the same bug) as noted in: http://bugzilla.kernel.org/show_bug.cgi?id=13736 Signed-off-by: David Härdeman <david@hardeman.nu> LKML-Reference: <20100128200254.GA9134@hardeman.nu> Cc: Alexey Fisher <bug-track@fisher-privat.net> Cc: ykzhao <yakui.zhao@intel.com> Cc: Tony Bones <aabonesml@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09x86: Add Dell OptiPlex 760 reboot quirkLeann Ogasawara
commit 35ea63d70f827a26c150993b4b940925bb02b03f upstream. Dell OptiPlex 760 hangs on reboot unless reboot=bios is used. Add quirk to reboot through the BIOS. BugLink: https://bugs.launchpad.net/bugs/488319 Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com> LKML-Reference: <1264634958.27335.1091.camel@emiko> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09x86: Disable HPET MSI on ATI SB700/SB800Venkatesh Pallipadi
commit 73472a46b5b28116b145fb5fc05242c1aa8e1461 upstream HPET MSI on platforms with ATI SB700/SB800 as they seem to have some side-effects on floppy DMA. Do not use HPET MSI on such platforms. Original problem report from Mark Hounschell http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html Tested-by: Mark Hounschell <markh@compro.net> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: <stable@kernel.org> LKML-Reference: <20100121190952.GA32523@linux-os.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-09x86: get rid of the insane TIF_ABI_PENDING bitH. Peter Anvin
commit 05d43ed8a89c159ff641d472f970e3f1baa66318 upstream. Now that the previous commit made it possible to do the personality setting at the point of no return, we do just that for ELF binaries. And suddenly all the reasons for that insane TIF_ABI_PENDING bit go away, and we can just make SET_PERSONALITY() just do the obvious thing for a 32-bit compat process. Everything becomes much more straightforward this way. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09Split 'flush_old_exec' into two functionsLinus Torvalds
commit 221af7f87b97431e3ee21ce4b0e77d5411cf1549 upstream. 'flush_old_exec()' is the point of no return when doing an execve(), and it is pretty badly misnamed. It doesn't just flush the old executable environment, it also starts up the new one. Which is very inconvenient for things like setting up the new personality, because we want the new personality to affect the starting of the new environment, but at the same time we do _not_ want the new personality to take effect if flushing the old one fails. As a result, the x86-64 '32-bit' personality is actually done using this insane "I'm going to change the ABI, but I haven't done it yet" bit (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the personality, but just the "pending" bit, so that "flush_thread()" can do the actual personality magic. This patch in no way changes any of that insanity, but it does split the 'flush_old_exec()' function up into a preparatory part that can fail (still called flush_old_exec()), and a new part that will actually set up the new exec environment (setup_new_exec()). All callers are changed to trivially comply with the new world order. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09x86: Remove "x86 CPU features in debugfs" (CONFIG_X86_CPU_DEBUG)H. Peter Anvin
commit b160091802d4a76dd063facb09fcf10bf5d5d747 upstream. CONFIG_X86_CPU_DEBUG, which provides some parsed versions of the x86 CPU configuration via debugfs, has caused boot failures on real hardware. The value of this feature has been marginal at best, as all this information is already available to userspace via generic interfaces. Causes crashes that have not been fixed + minimal utility -> remove. See the referenced LKML thread for more information. Reported-by: Ozan Çağlayan <ozan@pardus.org.tr> Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <alpine.LFD.2.00.1001221755320.13231@localhost.localdomain> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-02-09x86: Set hotpluggable nodes in nodes_possible_mapDavid Rientjes
commit 3a5fc0e40cb467e692737bc798bc99773c81e1e2 upstream. nodes_possible_map does not currently include nodes that have SRAT entries that are all ACPI_SRAT_MEM_HOT_PLUGGABLE since the bit is cleared in nodes_parsed if it does not have an online address range. Unequivocally setting the bit in nodes_parsed is insufficient since existing code, such as acpi_get_nodes(), assumes all nodes in the map have online address ranges. In fact, all code using nodes_parsed assumes such nodes represent an address range of online memory. nodes_possible_map is created by unioning nodes_parsed and cpu_nodes_parsed; the former represents nodes with online memory and the latter represents memoryless nodes. We now set the bit for hotpluggable nodes in cpu_nodes_parsed so that it also gets set in nodes_possible_map. [ hpa: Haicheng Li points out that this makes the naming of the variable cpu_nodes_parsed somewhat counterintuitive. However, leave it as is in the interest of keeping the pure bug fix patch small. ] Signed-off-by: David Rientjes <rientjes@google.com> Tested-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <alpine.DEB.2.00.1001201152040.30528@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28x86, msr/cpuid: Pass the number of minors when unregistering MSR and CPUID ↵Russ Anderson
drivers. commit da482474b8396e1a099c37ffc6541b78775aedb4 upstream. Pass the number of minors when unregistering MSR and CPUID drivers. Reported-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: Dean Nelson <dnelson@redhat.com> LKML-Reference: <20100127023722.GA22305@sgi.com> Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28x86: Reenable TSC sync check at boot, even with NONSTOP_TSCPallipadi, Venkatesh
commit 6c56ccecf05fafe100ab4ea94f6fccbf5ff00db7 upstream. Commit 83ce4009 did the following change If the TSC is constant and non-stop, also set it reliable. But, there seems to be few systems that will end up with TSC warp across sockets, depending on how the cpus come out of reset. Skipping TSC sync test on such systems may result in time inconsistency later. So, reenable TSC sync test even on constant and non-stop TSC systems. Set, sched_clock_stable to 1 by default and reset it in mark_tsc_unstable, if TSC sync fails. This change still gives perf benefit mentioned in 83ce4009 for systems where TSC is reliable. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091217202702.GA18015@linux-os.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28KVM: x86: Fix leak of free lapic date in kvm_arch_vcpu_init()Wei Yongjun
commit 443c39bc9ef7d8f648408d74c97e943f3bb3f48a upstream. In function kvm_arch_vcpu_init(), if the memory malloc for vcpu->arch.mce_banks is fail, it does not free the memory of lapic date. This patch fixed it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28KVM: x86: Fix probable memory leak of vcpu->arch.mce_banksWei Yongjun
commit 36cb93fd6b6bf7e9163a69a8bf20207aed5fea44 upstream. vcpu->arch.mce_banks is malloc in kvm_arch_vcpu_init(), but never free in any place, this may cause memory leak. So this patch fixed to free it in kvm_arch_vcpu_uninit(). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28KVM: x86: Fix host_mapping_level()Sheng Yang
commit 82b7005f0e72d8d1a8226e4c192cbb0850d10b3f upstream. When found a error hva, should not return PAGE_SIZE but the level... Also clean up the coding style of the following loop. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28KVM: MMU: bail out pagewalk on kvm_read_guest errorMarcelo Tosatti
commit a6085fbaf65ab09bfb5ec8d902d6d21680fe1895 upstream. Exit the guest pagetable walk loop if reading gpte failed. Otherwise its possible to enter an endless loop processing the previous present pte. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-28KVM: Fix race between APIC TMR and IRRAvi Kivity
commit a5d36f82c4f3e852b61fdf1fee13463c8aa91b90 upstream. When we queue an interrupt to the local apic, we set the IRR before the TMR. The vcpu can pick up the IRR and inject the interrupt before setting the TMR, and perhaps even EOI it, causing incorrect behaviour. The race is really insignificant since it can only occur on the first interrupt (usually following interrupts will not change TMR), but it's better closed than open. Fixed by reordering setting the TMR vs IRR. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25x86/PCI/PAT: return EINVAL for pci mmap WC request for !pat_enabledSuresh Siddha
commit 2992e545ea006992ec9dc91c4fa996ce1e15f921 upstream. Thomas Schlichter reported: > X.org uses libpciaccess which tries to mmap with write combining enabled via > /sys/bus/pci/devices/*/resource0_wc. Currently, when PAT is not enabled, the > kernel does fall back to uncached mmap. Then libpciaccess thinks it succeeded > mapping with write combining enabled and does not set up suited MTRR entries. > ;-( Instead of silently mapping pci mmap region as UC minus in the case of !pat_enabled and wc request, we can return error. Eric Anholt mentioned that caller (like X) typically follows up with UC minus pci mmap request and if there is a free mtrr slot, caller will manage adding WC mtrr. Jesse Barnes says: > Older versions of libpciaccess will behave better if we do it that way > (iirc it only allocates an MTRR if the resource_wc file doesn't exist or > fails to get mapped). Reported-by: Thomas Schlichter <thomas.schlichter@web.de> Signed-off-by: Thomas Schlichter <thomas.schlichter@web.de> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Eric Anholt <eric@anholt.net> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-25x86, msr/cpuid: Register enough minors for the MSR and CPUID driversH. Peter Anvin
commit 0b962d473af32ec334df271b54ff4973cb2b4c73 upstream. register_chrdev() hardcodes registering 256 minors, presumably to avoid breaking old drivers. However, we need to register enough minors so that we have all possible CPUs. checkpatch warns on this patch, but the patch is correct: NR_CPUS here is a static *upper bound* on the *maximum CPU index* (not *number of CPUs!*) and that is what we want. Reported-and-tested-by: Russ Anderson <rja@sgi.com> Cc: Tejun Heo <tj@kernel.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Takashi Iwai <tiwai@suse.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <tip-*@git.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22x86: SGI UV: Fix mapping of MMIO registersMike Travis
commit fcfbb2b5facd65efa7284cc315225bfe3d1856c2 upstream. This fixes the problem of the initialization code not correctly mapping the entire MMIO space on a UV system. A side effect is the map_high() interface needed to be changed to accommodate different address and size shifts. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Mike Habeck <habeck@sgi.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4B479202.7080705@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22x86, apic: use physical mode for IBM summit platformsSuresh Siddha
commit dfea91d5a7c795fd6f4e1a97489a98e4e767463e upstream. Chris McDermott from IBM confirmed that hurricane chipset in IBM summit platforms doesn't support logical flat mode. Irrespective of the other things like apic_id's, total number of logical cpu's, Linux kernel should default to physical mode for this system. The 32-bit kernel does so using the OEM checks for the IBM summit platform. Add a similar OEM platform check for the 64bit kernel too. Otherwise the linux kernel boot can hang on this platform under certain bios/platform settings. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Chris McDermott <lcm@linux.vnet.ibm.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-22x86, mce: Thermal monitoring depends on APIC being enabledCyrill Gorcunov
commit 485a2e1973fd9f98c2c6776e66ac4721882b69e0 upstream. Add check if APIC is not disabled since thermal monitoring depends on it. As only apic gets disabled we should not try to install "thermal monitor" vector, print out that thermal monitoring is enabled and etc... Note that "Intel Correct Machine Check Interrupts" already has such a check. Also I decided to not add cpu_has_apic check into mcheck_intel_therm_init since even if it'll call apic_read on disabled apic -- it's safe here and allow us to save a few code bytes. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> LKML-Reference: <4B25FDC2.3020401@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18Revert "x86: Side-step lguest problem by only building cmpxchg8b_emu for ↵Rusty Russell
pre-Pentium" commit db677ffa5f5a4f15b9dad4d132b3477b80766d82 upstream. This reverts commit ae1b22f6e46c03cede7cea234d0bf2253b4261cf. As Linus said in 982d007a6ee: "There was something really messy about cmpxchg8b and clone CPU's, so if you enable it on other CPUs later, do it carefully." This breaks lguest for those configs, but we can fix that by emulating if we have to. Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14884 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-18untangle the do_mremap() messAl Viro
This backports the following upstream commits all as one patch: 54f5de709984bae0d31d823ff03de755f9dcac54 ecc1a8993751de4e82eb18640d631dae1f626bd6 1a0ef85f84feb13f07b604fcf5b90ef7c2b5c82f f106af4e90eadd76cfc0b5325f659619e08fb762 097eed103862f9c6a97f2e415e21d1134017b135 935874141df839c706cd6cdc438e85eb69d1525e 0ec62d290912bb4b989be7563851bc364ec73b56 c4caa778157dbbf04116f0ac2111e389b5cd7a29 2ea1d13f64efdf49319e86c87d9ba38c30902782 570dcf2c15463842e384eb597a87c1e39bead99b 564b3bffc619dcbdd160de597b0547a7017ea010 0067bd8a55862ac9dd212bd1c4f6f5bff1ca1301 f8b7256096a20436f6d0926747e3ac3d64c81d24 8c7b49b3ecd48923eb64ff57e07a1cdb74782970 9206de95b1ea68357996ec02be5db0638a0de2c1 2c6a10161d0b5fc047b5bd81b03693b9af99fab5 05d72faa6d13c9d857478a5d35c85db9adada685 bb52d6694002b9d632bb355f64daa045c6293a4e e77414e0aad6a1b063ba5e5750c582c75327ea6a aa65607373a4daf2010e8c3867b6317619f3c1a3 Backport done by Greg Kroah-Hartman. Only minor tweaks were needed. Cc: David S. Miller <davem@davemloft.net> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06x86, msr: msrs_alloc/free for CONFIG_SMP=nBorislav Petkov
commit 6ede31e03084ee084bcee073ef3d1136f68d0906 upstream. Randy Dunlap reported the following build error: "When CONFIG_SMP=n, CONFIG_X86_MSR=m: ERROR: "msrs_free" [drivers/edac/amd64_edac_mod.ko] undefined! ERROR: "msrs_alloc" [drivers/edac/amd64_edac_mod.ko] undefined!" This is due to the fact that <arch/x86/lib/msr.c> is conditioned on CONFIG_SMP and in the UP case we have only the stubs in the header. Fork off SMP functionality into a new file (msr-smp.c) and build msrs_{alloc,free} unconditionally. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Borislav Petkov <petkovbb@gmail.com> LKML-Reference: <20091216231625.GD27228@liondog.tnic> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06x86, msr: Add support for non-contiguous cpumasksBorislav Petkov
commit 505422517d3f126bb939439e9d15dece94e11d2c upstream. The current rd/wrmsr_on_cpus helpers assume that the supplied cpumasks are contiguous. However, there are machines out there like some K8 multinode Opterons which have a non-contiguous core enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see http://www.gossamer-threads.com/lists/linux/kernel/1160268. This patch fixes out-of-bounds writes (see URL above) by adding per-CPU msr structs which are used on the respective cores. Additionally, two helpers, msrs_{alloc,free}, are provided for use by the callers of the MSR accessors. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091211171440.GD31998@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06x86, msr: Unify rdmsr_on_cpus/wrmsr_on_cpusBorislav Petkov
commit b8a4754147d61f5359a765a3afd3eb03012aa052 upstream. Since rdmsr_on_cpus and wrmsr_on_cpus are almost identical, unify them into a common __rwmsr_on_cpus helper thus avoiding code duplication. While at it, convert cpumask_t's to const struct cpumask *. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-06x86/ptrace: make genregs[32]_get/set more robustLinus Torvalds
commit 04a1e62c2cec820501f93526ad1e46073b802dc4 upstream. The loop condition is fragile: we compare an unsigned value to zero, and then decrement it by something larger than one in the loop. All the callers should be passing in appropriately aligned buffer lengths, but it's better to just not rely on it, and have some appropriate defensive loop limits. Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>