linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2009-10-12	KVM: x86: Disallow hypercalls for guest callers in rings > 0 [CVE-2009-3290]	Jan Kiszka
	[ backport to 2.6.27 by Chuck Ebbert <cebbert@redhat.com> ] commit 07708c4af1346ab1521b26a202f438366b7bcffd upstream. So far unprivileged guest callers running in ring 3 can issue, e.g., MMU hypercalls. Normally, such callers cannot provide any hand-crafted MMU command structure as it has to be passed by its physical address, but they can still crash the guest kernel by passing random addresses. To close the hole, this patch considers hypercalls valid only if issued from guest ring 0. This may still be relaxed on a per-hypercall base in the future once required. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: MMU: protect kvm_mmu_change_mmu_pages with mmu_lock	Marcelo Tosatti
	(cherry picked from commit 7c8a83b75a38a807d37f5a4398eca2a42c8cf513) kvm_handle_hva, called by MMU notifiers, manipulates mmu data only with the protection of mmu_lock. Update kvm_mmu_change_mmu_pages callers to take mmu_lock, thus protecting against kvm_handle_hva. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: x86: check for cr3 validity in mmu_alloc_roots	Marcelo Tosatti
	(cherry picked from commit 8986ecc0ef58c96eec48d8502c048f3ab67fd8e2) Verify the cr3 address stored in vcpu->arch.cr3 points to an existant memslot. If not, inject a triple fault. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Fix PDPTR reloading on CR4 writes	Avi Kivity
	(cherry picked from commit a2edf57f510cce6a389cc14e58c6ad0a4296d6f9) The processor is documented to reload the PDPTRs while in PAE mode if any of the CR4 bits PSE, PGE, or PAE change. Linux relies on this behaviour when zapping the low mappings of PAE kernels during boot. The code already handled changes to CR4.PAE; augment it to also notice changes to PSE and PGE. This triggered while booting an F11 PAE kernel; the futex initialization code runs before any CR3 reloads and writes to a NULL pointer; the futex subsystem ended up uninitialized, killing PI futexes and pulseaudio which uses them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Make EFER reads safe when EFER does not exist	Avi Kivity
	(cherry picked from commit e286e86e6d2042d67d09244aa0e05ffef75c9d54) Some processors don't have EFER; don't oops if userspace wants us to read EFER when we check NX. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: set debug registers after "schedulable" section	Marcelo Tosatti
	(cherry picked from commit 29415c37f043d1d54dcf356601d738ff6633b72b) The vcpu thread can be preempted after the guest_debug_pre() callback, resulting in invalid debug registers on the new vcpu. Move it inside the non-preemptable section. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: add MC5_MISC msr read support	Joerg Roedel
	(cherry picked from commit a89c1ad270ca7ad0eec2667bc754362ce7b142be) Currently KVM implements MC0-MC4_MISC read support. When booting Linux this results in KVM warnings in the kernel log when the guest tries to read MC5_MISC. Fix this warnings with this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Reduce stack usage in kvm_arch_vcpu_ioctl()	Dave Hansen
	(cherry picked from commit b772ff362ec6b821c8a5227a3355e263f917bfad) [sheng: fix KVM_GET_LAPIC using wrong size] Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Reduce kvm stack usage in kvm_arch_vm_ioctl()	Dave Hansen
	(cherry picked from commit f0d662759a2465babdba1160749c446648c9d159) On my machine with gcc 3.4, kvm uses ~2k of stack in a few select functions. This is mostly because gcc fails to notice that the different case: statements could have their stack usage combined. It overflows very nicely if interrupts happen during one of these large uses. This patch uses two methods for reducing stack usage. 1. dynamically allocate large objects instead of putting on the stack. 2. Use a union{} member for all of the case variables. This tricks gcc into combining them all into a single stack allocation. (There's also a comment on this) Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Allocate guest memory as MAP_PRIVATE, not MAP_SHARED	Avi Kivity
	(cherry picked from commit acee3c04e8208c17aad1baff99baa68d71640a19) There is no reason to share internal memory slots with fork()ed instances. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Load real mode segments correctly	Avi Kivity
	(cherry picked from commit f4bbd9aaaae23007e4d79536d35a30cbbb11d407) Real mode segments to not reference the GDT or LDT; they simply compute base = selector * 16. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-29	KVM: Advertise synchronized mmu support to userspace	Avi Kivity
	Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-29	KVM: Allow browsing memslots with mmu_lock	Andrea Arcangeli
	This allows reading memslots with only the mmu_lock hold for mmu notifiers that runs in atomic context and with mmu_lock held. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-29	KVM: Allow reading aliases with mmu_lock	Andrea Arcangeli
	This allows the mmu notifier code to run unalias_gfn with only the mmu_lock held. Only alias writes need the mmu_lock held. Readers will either take the slots_lock in read mode or the mmu_lock. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-27	KVM: task switch: translate guest segment limit to virt-extension byte ↵	Marcelo Tosatti
	granular field If 'g' is one then limit is 4kb granular. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-27	KVM: task switch: use seg regs provided by subarch instead of reading from GDT	Marcelo Tosatti
	There is no guarantee that the old TSS descriptor in the GDT contains the proper base address. This is the case for Windows installation's reboot-via-triplefault. Use guest registers instead. Also translate the address properly. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-27	KVM: task switch: segment base is linear address	Marcelo Tosatti
	The segment base is always a linear address, so translate before accessing guest memory. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destruction	Marcelo Tosatti
	Flush the shadow mmu before removing regions to avoid stale entries. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: Prefix some x86 low level function with kvm_, to avoid namespace issues	Avi Kivity
	Fixes compilation with CONFIG_VMI enabled. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: Apply the kernel sigmask to vcpus blocked due to being uninitialized	Avi Kivity
	Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: move slots_lock acquision down to vapic_exit	Marcelo Tosatti
	There is no need to grab slots_lock if the vapic_page will not be touched. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: x86 emulator: lazily evaluate segment registers	Avi Kivity
	Instead of prefetching all segment bases before emulation, read them at the last moment. Since most of them are unneeded, we save some cycles on Intel machines where this is a bit expensive. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: Use printk_rlimit() instead of reporting emulation failures just once	Avi Kivity
	Emulation failure reports are useful, so allow more than one per the lifetime of the module. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: Do not calculate linear rip in emulation failure report	Glauber Costa
	If we're not gonna do anything (case in which failure is already reported), we do not need to even bother with calculating the linear rip. Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: Add coalesced MMIO support (x86 part)	Laurent Vivier
	This patch enables coalesced MMIO for x86 architecture. It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO. It enables the compilation of coalesced_mmio.c. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: kvm_io_device: extend in_range() to manage len and write attribute	Laurent Vivier
	Modify member in_range() of structure kvm_io_device to pass length and the type of the I/O (write or read). This modification allows to use kvm_io_device with coalesced MMIO. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: Prefixes segment functions that will be exported with "kvm_"	Guillaume Thouvenin
	Prefixes functions that will be exported with kvm_. We also prefixed set_segment() even if it still static to be coherent. signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: MTRR support	Avi Kivity
	Add emulation for the memory type range registers, needed by VMware esx 3.5, and by pci device assignment. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: VMX: Enable NMI with in-kernel irqchip	Sheng Yang
	Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: IOAPIC/LAPIC: Enable NMI support	Sheng Yang
	[avi: fix ia64 build breakage] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: Remove unnecessary ->decache_regs() call	Avi Kivity
	Since we aren't modifying any register, there's no need to decache the register state. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: Remove decache_vcpus_on_cpu() and related callbacks	Avi Kivity
	Obsoleted by the vmx-specific per-cpu list. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: VMX: Add list of potentially locally cached vcpus	Avi Kivity
	VMX hardware can cache the contents of a vcpu's vmcs. This cache needs to be flushed when migrating a vcpu to another cpu, or (which is the case that interests us here) when disabling hardware virtualization on a cpu. The current implementation of decaching iterates over the list of all vcpus, picks the ones that are potentially cached on the cpu that is being offlined, and flushes the cache. The problem is that it uses mutex_trylock() to gain exclusive access to the vcpu, which fires off a (benign) warning about using the mutex in an interrupt context. To avoid this, and to make things generally nicer, add a new per-cpu list of potentially cached vcus. This makes the decaching code much simpler. The list is vmx-specific since other hardware doesn't have this issue. [andrea: fix crash on suspend/resume] Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: add missing kvmtrace bits	Joerg Roedel
	This patch adds some kvmtrace bits to the generic x86 code where it is instrumented from SVM. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20	KVM: add statics were possible, function definition in lapic.h	Harvey Harrison
	Noticed by sparse: arch/x86/kvm/vmx.c:1583:6: warning: symbol 'vmx_disable_intercept_for_msr' was not declared. Should it be static? arch/x86/kvm/x86.c:3406:5: warning: symbol 'kvm_task_switch_16' was not declared. Should it be static? arch/x86/kvm/x86.c:3429:5: warning: symbol 'kvm_task_switch_32' was not declared. Should it be static? arch/x86/kvm/mmu.c:1968:6: warning: symbol 'kvm_mmu_remove_one_alloc_mmu_page' was not declared. Should it be static? arch/x86/kvm/mmu.c:2014:6: warning: symbol 'mmu_destroy_caches' was not declared. Should it be static? arch/x86/kvm/lapic.c:862:5: warning: symbol 'kvm_lapic_get_base' was not declared. Should it be static? arch/x86/kvm/i8254.c:94:5: warning: symbol 'pit_get_gate' was not declared. Should it be static? arch/x86/kvm/i8254.c:196:5: warning: symbol '__pit_timer_fn' was not declared. Should it be static? arch/x86/kvm/i8254.c:561:6: warning: symbol '__inject_pit_timer_intr' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-26	smp_call_function: get rid of the unused nonatomic/retry argument	Jens Axboe
	It's never used and the comments refer to nonatomic and retry interchangably. So get rid of it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-24	KVM: Make kvm host use the paravirt clocksource structs	Gerd Hoffmann
	This patch updates the kvm host code to use the pvclock structs. It also makes the paravirt clock compatible with Xen. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24	KVM: close timer injection race window in __vcpu_run	Marcelo Tosatti
	If a timer fires after kvm_inject_pending_timer_irqs() but before local_irq_disable() the code will enter guest mode and only inject such timer interrupt the next time an unrelated event causes an exit. It would be simpler if the timer->pending irq conversion could be done with IRQ's disabled, so that the above problem cannot happen. For now introduce a new vcpu requests bit to cancel guest entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24	KVM: Fix race between timer migration and vcpu migration	Marcelo Tosatti
	A guest vcpu instance can be scheduled to a different physical CPU between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable(). If that happens, the timer will only be migrated to the current pCPU on the next exit, meaning that guest LAPIC timer event can be delayed until a host interrupt is triggered. Fix it by cancelling guest entry if any vcpu request is pending. This has the side effect of nicely consolidating vcpu->requests checks. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06	KVM: migrate PIT timer	Marcelo Tosatti
	Migrate the PIT timer to the physical CPU which vcpu0 is scheduled on, similarly to what is done for the LAPIC timers, otherwise PIT interrupts will be delayed until an unrelated event causes an exit. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04	KVM: avoid fx_init() schedule in atomic	Andrea Arcangeli
	This make sure not to schedule in atomic during fx_init. I also changed the name of fpu_init to fx_finit to avoid duplicating the name with fpu_init that is already used in the kernel, this makes grep simpler if nothing else. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04	KVM: Avoid spurious execeptions after setting registers	Jan Kiszka
	Clear pending exceptions when setting new register values. This avoids spurious exceptions after restoring a vcpu state or after reset-on-triple-fault. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04	KVM: x86: task switch: fix wrong bit setting for the busy flag	Izik Eidus
	The busy bit is bit 1 of the type field, not bit 8. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04	KVM: VMX: Prepare an identity page table for EPT in real mode	Sheng Yang
	[aliguory: plug leak] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04	KVM: MMU: Add EPT support	Sheng Yang
	Enable kvm_set_spte() to generate EPT entries. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27	KVM: fix kvm_vcpu_kick vs __vcpu_run race	Marcelo Tosatti
	There is a window open between testing of pending IRQ's and assignment of guest_mode in __vcpu_run. Injection of IRQ's can race with __vcpu_run as follows: CPU0 CPU1 kvm_x86_ops->run() vcpu->guest_mode = 0 SET_IRQ_LINE ioctl .. kvm_x86_ops->inject_pending_irq kvm_cpu_has_interrupt() apic_test_and_set_irr() kvm_vcpu_kick if (vcpu->guest_mode) send_ipi() vcpu->guest_mode = 1 So move guest_mode=1 assignment before ->inject_pending_irq, and make sure that it won't reorder after it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27	KVM: add ioctls to save/store mpstate	Marcelo Tosatti
	So userspace can save/restore the mpstate during migration. [avi: export the #define constants describing the value] [christian: add s390 stubs] [avi: ditto for ia64] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27	KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*	Avi Kivity
	We wish to export it to userspace, so move it into the kvm namespace. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27	KVM: Add trace markers	Feng (Eric) Liu
	Trace markers allow userspace to trace execution of a virtual machine in order to monitor its performance. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27	KVM: Free apic access page on vm destruction	Avi Kivity
	Noticed by Marcelo Tosatti. Signed-off-by: Avi Kivity <avi@qumranet.com>