aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2009-11-09KVM: ignore reads from AMDs C1E enabled MSRAndre Przywara
commit 1fdbd48c242db996107f72ae4140ffe8163e26a8 upstream. If the Linux kernel detects an C1E capable AMD processor (K8 RevF and higher), it will access a certain MSR on every attempt to go to halt. Explicitly handle this read and return 0 to let KVM run a Linux guest with the native AMD host CPU propagated to the guest. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09KVM: use proper hrtimer function to retrieve expiration timeMarcelo Tosatti
commit ace1546487a0fe4634e3251067f8a32cb2cdc099 upstream. hrtimer->base can be temporarily NULL due to racing hrtimer_start. See switch_hrtimer_base/lock_hrtimer_base. Use hrtimer_get_remaining which is robust against it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-11-09pci: increase alignment to make more space for hidden codeYinghai Lu
commit 15b812f1d0a5ca8f5efe7f5882f468af10682ca8 upstream. As reported in http://bugzilla.kernel.org/show_bug.cgi?id=13940 on some system when acpi are enabled, acpi clears some BAR for some devices without reason, and kernel will need to allocate devices for them. It then apparently hits some undocumented resource conflict, resulting in non-working devices. Try to increase alignment to get more safe range for unassigned devices. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-22arm, cris, mips, sparc, powerpc, um, xtensa: fix build with bash 4.0Sam Ravnborg
commit 51b563fc93c8cb5bff1d67a0a71c374e4a4ea049 upstream. Albin Tonnerre <albin.tonnerre@free-electrons.com> reported: Bash 4 filters out variables which contain a dot in them. This happends to be the case of CPPFLAGS_vmlinux.lds. This is rather unfortunate, as it now causes build failures when using SHELL=/bin/bash to compile, or when bash happens to be used by make (eg when it's /bin/sh) Remove the common definition of CPPFLAGS_vmlinux.lds by pushing relevant stuff to either Makefile.build or the arch specific kernel/Makefile where we build the linker script. This is also nice cleanup as we move the information out where it is used. Notes for the different architectures touched: arm - we use an already exported symbol cris - we use a config symbol aleady available [Not build tested] mips - the jiffies complexity has moved to vmlinux.lds.S where we need it. Added a few variables to CPPFLAGS - they are only used by the linker script. [Not build tested] powerpc - removed assignment that is not needed [not build tested] sparc - simplified it using $(BITS) um - introduced a few new exported variables to deal with this xtensa - added options to CPP invocation [not build tested] Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22x86/paravirt: Use normal calling sequences for irq enable/disableJeremy Fitzhardinge
commit 71999d9862e667f1fd14f8fbfa0cce6d855bad3f upstream. Bastian Blank reported a boot crash with stackprotector enabled, and debugged it back to edx register corruption. For historical reasons irq enable/disable/save/restore had special calling sequences to make them more efficient. With the more recent introduction of higher-level and more general optimisations this is no longer necessary so we can just use the normal PVOP_ macros. This fixes some residual bugs in the old implementations which left edx liable to inadvertent clobbering. Also, fix some bugs in __PVOP_VCALLEESAVE which were revealed by actual use. Reported-by: Bastian Blank <bastian@waldi.eu.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> LKML-Reference: <4AD3BC9B.7040501@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-22ARM: pxa: workaround errata #37 by not using half turbo switchingDennis O'Brien
commit 4367216a099b4df3fa2c4f2b086cda1a1e9afc4e upstream. PXA27x Errata #37 implies system will hang when switching into or out of half turbo (HT bit in CLKCFG) mode, workaround this by not using it. Signed-off-by: Dennis O'Brien <dennis.obrien@eqware.net> Signed-off-by: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12ACPI: fix Compaq Evo N800c (Pentium 4m) boot hang regressionZhao Yakui
commit 3e2ada5867b7e9fa0b296d30fa8f3726ebd0a8b7 upstream. Don't disable ARB_DISABLE when the familary ID is 0x0F. http://bugzilla.kernel.org/show_bug.cgi?id=14211 This was a 2.6.31 regression, and so this patch needs to be applied to 2.6.31.stable Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12PIT fixes to unbreak suspend/resume (bug #14222)john stultz
Resolved differently upstream in commit 8cab02dc3c58a12235c6d463ce684dded9696848 Ondrej Zary reported a suspend/resume hang with 2.6.31 in bug #14222. http://bugzilla.kernel.org/show_bug.cgi?id=14222 The hang was bisected to c7121843685de2bf7f3afd3ae1d6a146010bf1fc however, that was really just the last straw that caused the issue. The problem was that on suspend, the PIT is removed as a clocksource, and was using the mult value essentially as a is_enabled() flag. The mult adjustments done in the commit above caused that usage to break, causing bad list manipulation and the oops. Further, on resume, the PIT clocksource is never restored, causing the system to run in a degraded mode with jiffies as the clocksource. This issue has since been resolved in 2.6.32-rc by commit 8cab02dc3c58a12235c6d463ce684dded9696848 which removes the clocksource disabling on suspend. Testing shows no issues there. So the following patch rectifies the situation for 2.6.31 users of the PIT clocksource that use suspend and resume (which is probably not that many). Many thanks to Ondrej for helping narrow down what was happening, what caused it, and verifying the fix. --------------- Avoid using the unprotected clocksource.mult value as an "is_registered" flag, instead us an explicit flag variable. This avoids possible list corruption if the clocksource is double-unregistered. Also re-register the PIT clocksource on resume so folks don't have to use jiffies after suspend. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12KVM: SVM: Handle tsc in svm_get_msr/svm_set_msr correctlyJoerg Roedel
commit 20824f30bb0b8ae0a4099895fd4509f54cf2e1e2 upstream. When running nested we need to touch the l1 guests tsc_offset. Otherwise changes will be lost or a wrong value be read. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12KVM: SVM: Fix tsc offset adjustment when running nestedJoerg Roedel
commit 77b1ab1732feb5e3dcbaf31d2f7547c5229f5f3a upstream. When svm_vcpu_load is called while the vcpu is running in guest mode the tsc adjustment made there is lost on the next emulated #vmexit. This causes the tsc running backwards in the guest. This patch fixes the issue by also adjusting the tsc_offset in the emulated hsave area so that it will not get lost. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12KVM: fix LAPIC timer period overflowAurelien Jarno
commit b2d83cfa3fdefe5c6573d443d099a18dc3a93c5f upstream. Don't overflow when computing the 64-bit period from 32-bit registers. Fixes sourceforge bug #2826486. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12KVM: VMX: flush TLB with INVEPT on cpu migrationMarcelo Tosatti
commit eb5109e311b5152c0614a28d7d615d087f268f19 upstream. It is possible that stale EPTP-tagged mappings are used, if a vcpu migrates to a different pcpu. Set KVM_REQ_TLB_FLUSH in vmx_vcpu_load, when switching pcpus, which will invalidate both VPID and EPT mappings on the next vm-entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12KVM: Prevent overflow in KVM_GET_SUPPORTED_CPUIDAvi Kivity
commit 6a54435560efdab1a08f429a954df4d6c740bddf upstream. The number of entries is multiplied by the entry size, which can overflow on 32-bit hosts. Bound the entry count instead. Reported-by: David Wagner <daw@cs.berkeley.edu> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12x86: Don't leak 64-bit kernel register values to 32-bit processesJan Beulich
commit 24e35800cdc4350fc34e2bed37b608a9e13ab3b6 upstream. While 32-bit processes can't directly access R8...R15, they can gain access to these registers by temporarily switching themselves into 64-bit mode. Therefore, registers not preserved anyway by called C functions (i.e. R8...R11) must be cleared prior to returning to user mode. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4AC34D73020000780001744A@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-12x86: fix csum_ipv6_magic asm memory clobberSamuel Thibault
commit 392d814daf460a9564d29b2cebc51e1ea34e0504 upstream. Just like ip_fast_csum, the assembly snippet in csum_ipv6_magic needs a memory clobber, as it is only passed the address of the buffer, not a memory reference to the buffer itself. This caused failures in Hurd's pfinetv4 when we tried to compile it with gcc-4.3 (bogus checksums). Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05powerpc: Fix incorrect setting of __HAVE_ARCH_PTE_SPECIALWeirich, Bernhard
[I'm going to fix upstream differently, by having all CPU types actually support _PAGE_SPECIAL, but I prefer the simple and obvious fix for -stable. -- Ben] The test that decides whether to define __HAVE_ARCH_PTE_SPECIAL on powerpc is bogus and will end up always defining it, even when _PAGE_SPECIAL is not supported (in which case it's 0) such as on 8xx or 40x processors. Signed-off-by: Bernhard Weirich <bernhard.weirich@riedel.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05powerpc/8xx: Fix regression introduced by cache coherency rewriteRex Feany
commit e0908085fc2391c85b85fb814ae1df377c8e0dcb upstream. After upgrading to the latest kernel on my mpc875 userspace started running incredibly slow (hours to get to a shell, even!). I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36, that patch removed a work-around for the 8xx. Adding it back makes my problem go away. Signed-off-by: Rex Feany <rfeany@mrv.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05Fix NULL ptr regression in powernow-k8Kurt Roeckx
commit f0adb134d8dc9993a9998dc50845ec4f6ff4fadc upstream. Fixes bugzilla #13780 From: Kurt Roeckx <kurt@roeckx.be> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05Revert "KVM: x86: check for cr3 validity in ioctl_set_sregs"Marcelo Tosatti
(cherry picked from commit dc7e795e3dd2a763e5ceaa1615f307e808cf3932) This reverts commit 6c20e1442bb1c62914bb85b7f4a38973d2a423ba. To my understanding, it became obsolete with the advent of the more robust check in mmu_alloc_roots (89da4ff17f). Moreover, it prevents the conceptually safe pattern 1. set sregs 2. register mem-slots 3. run vcpu by setting a sticky triple fault during step 1. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: Protect update_cr8_intercept() when running without an apicAvi Kivity
(cherry picked from commit 88c808fd42b53a7e01a2ac3253ef31fef74cb5af) update_cr8_intercept() can be triggered from userspace while there is no apic present. Signed-off-by: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: MMU: fix bogus alloc_mmu_pages assignmentMarcelo Tosatti
(cherry picked from commit b90c062c65cc8839edfac39778a37a55ca9bda36) Remove the bogus n_free_mmu_pages assignment from alloc_mmu_pages. It breaks accounting of mmu pages, since n_free_mmu_pages is modified but the real number of pages remains the same. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: MMU: fix missing locking in alloc_mmu_pagesMarcelo Tosatti
(cherry picked from commit 6a1ac77110ee3e8d8dfdef8442f3b30b3d83e6a2) n_requested_mmu_pages/n_free_mmu_pages are used by kvm_mmu_change_mmu_pages to calculate the number of pages to zap. alloc_mmu_pages, called from the vcpu initialization path, modifies this variables without proper locking, which can result in a negative value in kvm_mmu_change_mmu_pages (say, with cpu hotplug). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: fix cpuid E2BIG handling for extended request typesMark McLoughlin
(cherry picked from commit cb007648de83cf226d69ec76e1c01848b4e8e49f) If we run out of cpuid entries for extended request types we should return -E2BIG, just like we do for the standard request types. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05pxa/sharpsl_pm: zaurus c3000 aka spitz: fix resumePavel Machek
commit 99f329a2ba3c2d07cc90ca9309babf27ddf98bff upstream. sharpsl_pm.c code tries to read battery state very early during resume, but those battery meters are connected on SPI and that's only resumed way later. Replace the check with simple checking of battery fatal signal, that actually works at this stage. Signed-off-by: Pavel Machek <pavel@ucw.cz> Tested-by: Stanislav Brabec <utx@penguin.cz> Signed-off-by: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05agp/intel: Fix the pre-9xx chipset flush.Eric Anholt
commit e517a5e97080bbe52857bd0d7df9b66602d53c4d upstream. Ever since we enabled GEM, the pre-9xx chipsets (particularly 865) have had serious stability issues. Back in May a wbinvd was added to the DRM to work around much of the problem. Some failure remained -- easily visible by dragging a window around on an X -retro desktop, or by looking at bugzilla. The chipset flush was on the right track -- hitting the right amount of memory, and it appears to be the only way to flush on these chipsets, but the flush page was mapped uncached. As a result, the writes trying to clear the writeback cache ended up bypassing the cache, and not flushing anything! The wbinvd would flush out other writeback data and often cause the data we wanted to get flushed, but not always. By removing the setting of the page to UC and instead just clflushing the data we write to try to flush it, we get the desired behavior with no wbinvd. This exports clflush_cache_range(), which was laying around and happened to basically match the code I was otherwise going to copy from the DRM. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05x86: SGI UV: Fix IPI macrosJack Steiner
commit d2374aecda3f6c9b0d13287027132a37311da300 upstream. The UV BIOS has changed the way interrupt remapping is being done. This affects the id used for sending IPIs. The upper id bits no longer need to be masked off. Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20090909154104.GA25083@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05xen: check EFER for NX before setting up GDT mappingJeremy Fitzhardinge
commit b75fe4e5b869f8dbebd36df64a7fcda0c5b318ed upstream. x86-64 assumes NX is available by default, so we need to explicitly check for it before using NX. Some first-generation Intel x86-64 processors didn't support NX, and even recent systems allow it to be disabled in BIOS. [ Impact: prevent Xen crash on NX-less 64-bit machines ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05xen: use stronger barrier after unlocking lockYang Xiaowei
commit 2496afbf1e50c70f80992656bcb730c8583ddac3 upstream. We need to have a stronger barrier between releasing the lock and checking for any waiting spinners. A compiler barrier is not sufficient because the CPU's ordering rules do not prevent the read xl->spinners from happening before the unlock assignment, as they are different memory locations. We need to have an explicit barrier to enforce the write-read ordering to different memory locations. Because of it, I can't bring up > 4 HVM guests on one SMP machine. [ Code and commit comments expanded -J ] [ Impact: avoid deadlock when using Xen PV spinlocks ] Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05xen: only enable interrupts while actually blocking for spinlockJeremy Fitzhardinge
commit 4d576b57b50a92801e6493e76e5243d6cff193d2 upstream. Where possible we enable interrupts while waiting for a spinlock to become free, in order to reduce big latency spikes in interrupt handling. However, at present if we manage to pick up the spinlock just before blocking, we'll end up holding the lock with interrupts enabled for a while. This will cause a deadlock if we recieve an interrupt in that window, and the interrupt handler tries to take the lock too. Solve this by shrinking the interrupt-enabled region to just around the blocking call. [ Impact: avoid race/deadlock when using Xen PV spinlocks ] Reported-by: "Yang, Xiaowei" <xiaowei.yang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05xen: make -fstack-protector work under XenJeremy Fitzhardinge
commit 577eebeae34d340685d8985dfdb7dfe337c511e8 upstream. -fstack-protector uses a special per-cpu "stack canary" value. gcc generates special code in each function to test the canary to make sure that the function's stack hasn't been overrun. On x86-64, this is simply an offset of %gs, which is the usual per-cpu base segment register, so setting it up simply requires loading %gs's base as normal. On i386, the stack protector segment is %gs (rather than the usual kernel percpu %fs segment register). This requires setting up the full kernel GDT and then loading %gs accordingly. We also need to make sure %gs is initialized when bringing up secondary cpus too. To keep things consistent, we do the full GDT/segment register setup on both architectures. Because we need to avoid -fstack-protected code before setting up the GDT and because there's no way to disable it on a per-function basis, several files need to have stack-protector inhibited. [ Impact: allow Xen booting with stack-protector enabled ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05alpha: AGP update (fixes compile failure)Ivan Kokshaysky
commit d68721eb339e9237c11c1fea5f73f86211d14918 upstream. This brings Alpha AGP platforms in sync with the change to struct agp_memory (unsigned long *memory => struct page **pages). Only compile tested (I don't have titan/marvel hardware), but this change looks pretty straightforward, so hopefully it's ok. Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05x86: Increase MIN_GAP to include randomized stackMichal Hocko
commit 80938332d8cf652f6b16e0788cf0ca136befe0b5 upstream. Currently we are not including randomized stack size when calculating mmap_base address in arch_pick_mmap_layout for topdown case. This might cause that mmap_base starts in the stack reserved area because stack is randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB. If the stack really grows down to mmap_base then we can get silent mmap region overwrite by the stack values. Let's include maximum stack randomization size into MIN_GAP which is used as the low bound for the gap in mmap. Signed-off-by: Michal Hocko <mhocko@suse.cz> LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05KVM: VMX: Fix EPT with WP bit change during pagingSheng Yang
commit 95eb84a7588d7d7afd3096807efc052adc7479e1 upstream QNX update WP bit when paging enabled, which is not covered yet. This one fix QNX boot with EPT. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24powerpc/pseries: Fix to handle slb resize across migrationBrian King
commit 46db2f86a3b2a94e0b33e0b4548fb7b7b6bdff66 upstream. The SLB can change sizes across a live migration, which was not being handled, resulting in possible machine crashes during migration if migrating to a machine which has a smaller max SLB size than the source machine. Fix this by first reducing the SLB size to the minimum possible value, which is 32, prior to migration. Then during the device tree update which occurs after migration, we make the call to ensure the SLB gets updated. Also add the slb_size to the lparcfg output so that the migration tools can check to make sure the kernel has this capability before allowing migration in scenarios where the SLB size will change. BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid breaking ppc32 build Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: limit lapic periodic timer frequencyMarcelo Tosatti
commit 1444885a045fe3b1905a14ea1b52540bf556578b upstream. Otherwise its possible to starve the host by programming lapic timer with a very high frequency. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: x86 emulator: fix jmp far decoding (opcode 0xea)Avi Kivity
commit ee3d29e8bee8d7c321279a9bd9bd25d4cfbf79b7 upstream. The jump target should not be sign extened; use an unsigned decode flag. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: MMU: make __kvm_mmu_free_some_pages handle empty listIzik Eidus
commit 3b80fffe2b31fb716d3ebe729c54464ee7856723 upstream. First check if the list is empty before attempting to look at list entries. Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: x86 emulator: Implement zero-extended immediate decodingAvi Kivity
commit c9eaf20f268c7051bfde2ba212c5ea76a6cbc7a1 upstream. Absolute jumps use zero extended immediate operands. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: VMX: Fix cr8 exiting control clobbering by EPTGleb Natapov
commit 5fff7d270bd6a4759b6d663741b729cdee370257 upstream. Don't call adjust_vmx_controls() two times for the same control. It restores options that were dropped earlier. This loses us the cr8 exit control, which causes a massive performance regression Windows x64. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: x86: Disallow hypercalls for guest callers in rings > 0Jan Kiszka
commit 07708c4af1346ab1521b26a202f438366b7bcffd upstream. So far unprivileged guest callers running in ring 3 can issue, e.g., MMU hypercalls. Normally, such callers cannot provide any hand-crafted MMU command structure as it has to be passed by its physical address, but they can still crash the guest kernel by passing random addresses. To close the hole, this patch considers hypercalls valid only if issued from guest ring 0. This may still be relaxed on a per-hypercall base in the future once required. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM guest: fix bogus wallclock physical address calculationGlauber Costa
commit a20316d2aa41a8f4fd171648bad8f044f6060826 upstream. The use of __pa() to calculate the address of a C-visible symbol is wrong, and can lead to unpredictable results. See arch/x86/include/asm/page.h for details. It should be replaced with __pa_symbol(), that does the correct math here, by taking relocations into account. This ensures the correct wallclock data structure physical address is passed to the hypervisor. Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM: VMX: Check cpl before emulating debug register accessAvi Kivity
commit 0a79b009525b160081d75cef5dbf45817956acf2 upstream. Debug registers may only be accessed from cpl 0. Unfortunately, vmx will code to emulate the instruction even though it was issued from guest userspace, possibly leading to an unexpected trap later. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24KVM guest: do not batch pte updates from interrupt contextMarcelo Tosatti
commit 6ba661787594868512a71c129062ebd57d0c01e7 upstream. Commit b8bcfe997e4 made paravirt pte updates synchronous in interrupt context. Unfortunately the KVM pv mmu code caches the lazy/nonlazy mode internally, so a pte update from interrupt context during a lazy mmu operation can be batched while it should be performed synchronously. https://bugzilla.redhat.com/show_bug.cgi?id=518022 Drop the internal mode variable and use paravirt_get_lazy_mode(), which returns the correct state. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24ARM: 5691/1: fix cache aliasing issues between kmap() and kmap_atomic() with ↵Nicolas Pitre
highmem commit 7929eb9cf643ae416e5081b2a6fa558d37b9854c upstream. Let's suppose a highmem page is kmap'd with kmap(). A pkmap entry is used, the page mapped to it, and the virtual cache is dirtied. Then kunmap() is used which does virtually nothing except for decrementing a usage count. Then, let's suppose the _same_ page gets mapped using kmap_atomic(). It is therefore mapped onto a fixmap entry instead, which has a different virtual address unaware of the dirty cache data for that page sitting in the pkmap mapping. Fortunately it is easy to know if a pkmap mapping still exists for that page and use it directly with kmap_atomic(), thanks to kmap_high_get(). And actual testing with a printk in the added code path shows that this condition is actually met *extremely* frequently. Seems that we've been quite lucky that things have worked so well with highmem so far. Signed-off-by: Nicolas Pitre <nico@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24x86, pat: Fix cacheflush address in change_page_attr_set_clr()Jack Steiner
commit fa526d0d641b5365676a1fb821ce359e217c9b85 upstream. Fix address passed to cpa_flush_range() when changing page attributes from WB to UC. The address (*addr) is modified by __change_page_attr_set_clr(). The result is that the pages being flushed start at the _end_ of the changed range instead of the beginning. This should be considered for 2.6.30-stable and 2.6.31-stable. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24x86/i386: Make sure stack-protector segment base is cache alignedJeremy Fitzhardinge
commit 1ea0d14e480c245683927eecc03a70faf06e80c8 upstream. The Intel Optimization Reference Guide says: In Intel Atom microarchitecture, the address generation unit assumes that the segment base will be 0 by default. Non-zero segment base will cause load and store operations to experience a delay. - If the segment base isn't aligned to a cache line boundary, the max throughput of memory operations is reduced to one [e]very 9 cycles. [...] Assembly/Compiler Coding Rule 15. (H impact, ML generality) For Intel Atom processors, use segments with base set to 0 whenever possible; avoid non-zero segment base address that is not aligned to cache line boundary at all cost. We can't avoid having a non-zero base for the stack-protector segment, but we can make it cache-aligned. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> LKML-Reference: <4AA01893.6000507@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24x86: Fix x86_model test in es7000_apic_is_cluster()Roel Kluin
commit 005155b1f626d2b2d7932e4afdf4fead168c6888 upstream. For the x86_model to be greater than 6 or less than 12 is logically always true. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24powerpc: Fix bug where perf_counters breaks oprofilePaul Mackerras
commit a6dbf93a2ad853585409e715eb96dca9177e3c39 upstream. Currently there is a bug where if you use oprofile on a pSeries machine, then use perf_counters, then use oprofile again, oprofile will not work correctly; it will lose the PMU configuration the next time the hypervisor does a partition context switch, and thereafter won't count anything. Maynard Johnson identified the sequence causing the problem: - oprofile setup calls ppc_enable_pmcs(), which calls pseries_lpar_enable_pmcs, which tells the hypervisor that we want to use the PMU, and sets the "PMU in use" flag in the lppaca. This flag tells the hypervisor whether it needs to save and restore the PMU config. - The perf_counter code sets and clears the "PMU in use" flag directly as it context-switches the PMU between tasks, and leaves it clear when it finishes. - oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs, which does nothing because it has already been called. In particular it doesn't set the "PMU in use" flag. This fixes the problem by arranging for ppc_enable_pmcs to always set the "PMU in use" flag. It makes the perf_counter code call ppc_enable_pmcs also rather than calling the lower-level function directly, and removes the setting of the "PMU in use" flag from pseries_lpar_enable_pmcs, since that is now done in its caller. This also removes the declaration of pasemi_enable_pmcs because it isn't defined anywhere. Reported-by: Maynard Johnson <mpjohn@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24powerpc/perf_counters: Reduce stack usage of power_check_constraintsPaul Mackerras
commit e51ee31e8af22948dcc3b115978469b09c96c3fd upstream. Michael Ellerman reported stack-frame size warnings being produced for power_check_constraints(), which uses an 8*8 array of u64 and two 8*8 arrays of unsigned long, which are currently allocated on the stack, along with some other smaller variables. These arrays come to 1.5kB on 64-bit or 1kB on 32-bit, which is a bit too much for the stack. This fixes the problem by putting these arrays in the existing per-cpu cpu_hw_counters struct. This is OK because two of the call sites have interrupts disabled already; for the third call site we use get_cpu_var, which disables preemption, so we know we won't get a context switch while we're in power_check_constraints(). Note that power_check_constraints() can be called during context switch but is not called from interrupts. Reported-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24x86/amd-iommu: fix broken check in amd_iommu_flush_all_devicesJoerg Roedel
commit e0faf54ee82bf9c07f0307b4391caad4020bd659 upstream. The amd_iommu_pd_table is indexed by protection domain number and not by device id. So this check is broken and must be removed. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>