Age | Commit message (Collapse) | Author |
|
patch 8668a3c468ed55d19514117a5a959d91d3d03823 in mainline.
Resetting an SMP guest will force AP enter real mode (RESET) with
paging enabled in protected mode. While current enter_rmode() can
only handle mode switch from nonpaging mode to real mode which leads
to SMP reboot failure.
Fix by reloading the mmu context on entering real mode.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
patch 78f7826868da8e27d097802139a3fec39f47f3b8 in mainline.
When resetting from userspace, we need to handle the flags being cleared
even after we are in real mode.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
patch 0967b7bf1c22b55777aba46ff616547feed0b141 in mainline.
If we defer updating rip until pio instructions are executed, we have a
problem with reset: a pio reset updates rip, and when the instruction
completes we skip the emulated instruction, pointing rip somewhere completely
unrelated.
Fix by updating rip when we see decode the instruction, not after emulation.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
patch 404fb881b82cf0cf6981832f8d31a7484e4dee81 in mainline.
The clts code didn't use set_cr0 properly, so our lazy FPU
processing wasn't being done by the clts instruction at all.
(this isn't called on Intel as the hardware does the decode for us)
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
This is not in mainline, as it was fixed differently in that tree.
first_cpu(cpus) returns the only CPU when NR_CPUS is 1 regardless of
the cpus mask. Therefore we avoid a kernel hang in
KVM_SET_MEMORY_REGION ioctl on uniprocessor by not entering the loop at
all.
Signed-off-by: Marko Kohtala <marko.kohtala@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
patch 00b2ef475d4728ca53a2bc788c7978042907e354 in mainline.
emulator_write_std() is not implemented, and calling write_emulated should
work just as well in place of write_std.
Fixes emulator failures with the push r/m instruction.
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
patch cf5a94d1331b411b84414c13e43f578260942d6b in mainline.
'invd' can destroy host data, and 'wbinvd' allows the guest to induce
long (milliseconds) latencies.
Noted by Ben Serebrin.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
patch 651a3e29b3d19418d7a8a9787906061f9be7cc5f in mainline.
Emulate the 'invd' instruction (opcode 0f 08).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
and Mod = 3
patch 4e62417bf317504c0b85e0d7abd236f334f54eaf in mainline.
The patch belows changes the access type to register from memory for
instructions that are declared as SrcMem or DstMem, but have a
ModR/M byte with Mod = 3.
It fixes (at least) the lmsw and smsw instructions on an AMD64 CPU,
which are needed for FreeBSD.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
patch a012e65aee48379a7a87eadafa74f878b61522b9 in mainline.
Implement emulation of instruction:
movnti m32/m64, r32/r64
opcode: 0x0f 0xc3
Needed to support Linux 2.6.16 as guest (used for mmio).
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
What guest drivers?
Cc: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A guest context switch to an uncached cr3 can require allocation of
shadow pages, but we only recycle shadow pages in kvm_mmu_page_fault().
Move shadow page recycling to mmu_topup_memory_caches(), which is called
from both the page fault handler and from guest cr3 reload.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When taking a cpu down, we need to hardware_disable() it.
Unfortunately, the CPU_DYING notifier is called with interrupts
disabled, which means we can't use smp_call_function_single().
Fortunately, the CPU_DYING notifier is always called on the dying cpu,
so we don't need to use the function at all and can simply call
hardware_disable() directly.
Tested-by: Paolo Ornati <ornati@fastwebnet.it>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (edited MACINTOSH_DRIVERS per Geert Uytterhoeven's remark)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
More fallout from the writeback fixes: debug register transfer
instructions do their own writeback and thus need to disable the general
writeback mechanism.
This fixes oopses and some guest failures on AMD machines (the Intel
variant decodes the instruction in hardware and thus does not need
emulation).
Cc: Alistair John Strachan <alistair@devzero.co.uk>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
0x0f 0x01 instructions (ie lgdt, lidt, smsw, lmsw and invlpg) does
not use writeback. This patch set no_wb=1 when emulating those
instructions.
This fixes a regression booting the FreeBSD kernel on AMD.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Testing the wrong bit caused kvm not to disable nx on the guest when it is
disabled on the host (an mmu optimization relies on the nx bits being the
same in the guest and host).
This allows Windows to boot when nx is disabled on te host (e.g. when
host pae is disabled).
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This reverts commit a3c870bdce4d34332ebdba7eb9969592c4c6b243. While it
does save useless updates, it (probably) defeats the fork detector, causing
a massive performance loss.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
We add the kvm to the vm_list before initializing the vcpu mutexes,
which can be mutex_trylock()'ed by decache_vcpus_on_cpu().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Writes that are contiguous in virtual memory may not be contiguous in
physical memory; so split writes that straddle a page boundary.
Thanks to Aurelien for reporting the bug, patient testing, and a fix
to this very patch.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Found by Sebastian Siewior and randconfig.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
__free_page() wants a struct page, not a virtual address.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The kvm mmu uses page->private on shadow page tables; so does slub, and
an oops result. Fix by allocating regular pages for shadows instead of
using slub.
Tested-by: S.Çağlar Onur <caglar@pardus.org.tr>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Allow real-mode emulation of rdmsr and wrmsr. This allows smp Windows to
boot, presumably for its sipi trampoline.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
The memory slot management functions were oriented against vcpu 0, where
they should be kvm-wide. This causes hangs starting X on guest smp.
Fix by making the functions (and resultant tail in the mmu) non-vcpu-specific.
Unfortunately this reduces the efficiency of the mmu object cache a bit. We
may have to revisit this later.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
We need to distinguish between large page shadows which have the nx bit set
and those which don't. The problem shows up when booting a newer smp Linux
kernel, where the trampoline page (which is in real mode, which uses the
same shadow pages as large pages) is using the same mapping as a kernel data
page, which is mapped using nx, causing kvm to spin on that page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Currently, CONFIG_X86_CMPXCHG64 both enables boot-time checking of
the cmpxchg64b feature and enables compilation of the set_64bit() family.
Since the option is dependent on PAE, and since KVM depends on set_64bit(),
this effectively disables KVM on i386 nopae.
Simplify by removing the config option altogether: the boot check is made
dependent on CONFIG_X86_PAE directly, and the set_64bit() family is exposed
without constraints. It is up to users to check for the feature flag (KVM
does not as virtualiation extensions imply its existence).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Only at the CPU_DYING stage can we be sure that no user process will
be scheduled onto the cpu and oops when trying to use virtualization
extensions.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
The hotplug IPIs can be called from the cpu on which we are currently
running on, so use on_cpu(). Similarly, drop on_each_cpu() for the
suspend/resume callbacks, as we're in atomic context here and only one
cpu is up anyway.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
By keeping track of which cpus have virtualization enabled, we
prevent double-enable or double-disable during hotplug, which is a
very fatal oops.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Remove unnecessary ones, and rearange the remaining in the standard order.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
kvm uses a pseudo filesystem, kvmfs, to generate inodes, a job that the
new anonymous inodes source does much better.
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This patch adds an implementation to the svm is_disabled function to
detect reliably if the BIOS disabled the SVM feature in the CPU. This
fixes the issues with kernel panics when loading the kvm-amd module on
machines where SVM is available but disabled.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
A vmexit implicitly flushes the tlb; the code is bogus.
Noted by Shaohua Li.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Need to flush the tlb after updating a pte, not before.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Protected mode code may have corrupted the real-mode tss, so re-initialize
it when switching to real mode.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
When writing to normal memory and the memory area is unchanged the write
can be safely skipped, avoiding the costly kvm_mmu_pte_write.
Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
When the old value and new one are the same the emulator skips the
write; this is undesirable when the destination is a MMIO area and the
write shall be performed regardless of the previous value. This
optimization breaks e.g. a Linux guest APIC compiled without
X86_GOOD_APIC.
Remove the check and perform the writeback stage in the emulation unless
it's explicitly disabled (currently push and some 2 bytes instructions
may disable the writeback).
Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Useful for the PIC and PIT.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
With kernel-injected interrupts, we need to check for interrupts on
lightweight exits too.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
For use in real mode.
Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
If the time stamp counter goes backwards, a guest delay loop can become
infinite. This can happen if a vcpu is migrated to another cpu, where
the counter has a lower value than the first cpu.
Since we're doing an IPI to the first cpu anyway, we can use that to pick
up the old tsc, and use that to calculate the adjustment we need to make
to the tsc offset.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Needs to be set on vcpu 0 only.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
When a vcpu causes a shadow tlb entry to have reduced permissions, it
must also clear the tlb on remote vcpus. We do that by:
- setting a bit on the vcpu that requests a tlb flush before the next entry
- if the vcpu is currently executing, we send an ipi to make sure it
exits before we continue
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
That way, we don't need to loop for KVM_MAX_VCPUS for a single vcpu
vm.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|