aboutsummaryrefslogtreecommitdiff
path: root/virt
AgeCommit message (Collapse)Author
2013-03-19KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)Andy Honig
If the guest specifies a IOAPIC_REG_SELECT with an invalid value and follows that with a read of the IOAPIC_REG_WINDOW KVM does not properly validate that request. ioapic_read_indirect contains an ASSERT(redir_index < IOAPIC_NUM_PINS), but the ASSERT has no effect in non-debug builds. In recent kernels this allows a guest to cause a kernel oops by reading invalid memory. In older kernels (pre-3.3) this allows a guest to read from large ranges of host memory. Tested: tested against apic unit tests. Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-11KVM: Remove user_alloc from struct kvm_memory_slotTakuya Yoshikawa
This field was needed to differentiate memory slots created by the new API, KVM_SET_USER_MEMORY_REGION, from those by the old equivalent, KVM_SET_MEMORY_REGION, whose support was dropped long before: commit b74a07beed0e64bfba413dcb70dd6749c57f43dc KVM: Remove kernel-allocated memory regions Although we also have private memory slots to which KVM allocates memory with vm_mmap(), !user_alloc slots in other words, the slot id should be enough for differentiating them. Note: corresponding function parameters will be removed later. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-02-04KVM: set_memory_region: Disallow changing read-only attribute laterTakuya Yoshikawa
As Xiao pointed out, there are a few problems with it: - kvm_arch_commit_memory_region() write protects the memory slot only for GET_DIRTY_LOG when modifying the flags. - FNAME(sync_page) uses the old spte value to set a new one without checking KVM_MEM_READONLY flag. Since we flush all shadow pages when creating a new slot, the simplest fix is to disallow such problematic flag changes: this is safe because no one is doing such things. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-02-04KVM: set_memory_region: Identify the requested change explicitlyTakuya Yoshikawa
KVM_SET_USER_MEMORY_REGION forces __kvm_set_memory_region() to identify what kind of change is being requested by checking the arguments. The current code does this checking at various points in code and each condition being used there is not easy to understand at first glance. This patch consolidates these checks and introduces an enum to name the possible changes to clean up the code. Although this does not introduce any functional changes, there is one change which optimizes the code a bit: if we have nothing to change, the new code returns 0 immediately. Note that the return value for this case cannot be changed since QEMU relies on it: we noticed this when we changed it to -EINVAL and got a section mismatch error at the final stage of live migration. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2013-01-29kvm: Handle yield_to failure return code for potential undercommit caseRaghavendra K T
yield_to returns -ESRCH, When source and target of yield_to run queue length is one. When we see three successive failures of yield_to we assume we are in potential undercommit case and abort from PLE handler. The assumption is backed by low probability of wrong decision for even worst case scenarios such as average runqueue length between 1 and 2. More detail on rationale behind using three tries: if p is the probability of finding rq length one on a particular cpu, and if we do n tries, then probability of exiting ple handler is: p^(n+1) [ because we would have come across one source with rq length 1 and n target cpu rqs with length 1 ] so num tries: probability of aborting ple handler (1.5x overcommit) 1 1/4 2 1/8 3 1/16 We can increase this probability with more tries, but the problem is the overhead. Also, If we have tried three times that means we would have iterated over 3 good eligible vcpus along with many non-eligible candidates. In worst case if we iterate all the vcpus, we reduce 1x performance and overcommit performance get hit. note that we do not update last boosted vcpu in failure cases. Thank Avi for raising question on aborting after first fail from yield_to. Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-29x86, apicv: add virtual interrupt delivery supportYang Zhang
Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. This needs some special awareness into existing interrupr injection path: - for pending interrupt, instead of direct injection, we may need update architecture specific indicators before resuming to guest. - A pending interrupt, which is masked by ISR, should be also considered in above update action, since hardware will decide when to inject it at right time. Current has_interrupt and get_interrupt only returns a valid vector from injection p.o.v. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-27kvm: Obey read-only mappings in iommuAlex Williamson
We've been ignoring read-only mappings and programming everything into the iommu as read-write. Fix this to only include the write access flag when read-only is not set. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-27kvm: Force IOMMU remapping on memory slot read-only flag changesAlex Williamson
Memory slot flags can be altered without changing other parameters of the slot. The read-only attribute is the only one the IOMMU cares about, so generate an un-map, re-map when this occurs. This also avoid unnecessarily re-mapping the slot when no IOMMU visible changes are made. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-17KVM: set_memory_region: Remove unnecessary variable memslotTakuya Yoshikawa
One such variable, slot, is enough for holding a pointer temporarily. We also remove another local variable named slot, which is limited in a block, since it is confusing to have the same name in this function. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-17KVM: set_memory_region: Don't check for overlaps unless we create or move a slotTakuya Yoshikawa
Don't need the check for deleting an existing slot or just modifiying the flags. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-17KVM: set_memory_region: Don't jump to out_free unnecessarilyTakuya Yoshikawa
This makes the separation between the sanity checks and the rest of the code a bit clearer. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-01-14KVM: Write protect the updated slot only when dirty logging is enabledTakuya Yoshikawa
Calling kvm_mmu_slot_remove_write_access() for a deleted slot does nothing but search for non-existent mmu pages which have mappings to that deleted memory; this is safe but a waste of time. Since we want to make the function rmap based in a later patch, in a manner which makes it unsafe to be called for a deleted slot, we makes the caller see if the slot is non-zero and being dirty logged. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-24KVM: move the code that installs new slots array to a separate function.Gleb Natapov
Move repetitive code sequence to a separate function. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-23kvm: Fix memory slot generation updatesAlex Williamson
Previous patch "kvm: Minor memory slot optimization" (b7f69c555ca43) overlooked the generation field of the memory slots. Re-using the original memory slots left us with with two slightly different memory slots with the same generation. To fix this, make update_memslots() take a new parameter to specify the last generation. This also makes generation management more explicit to avoid such problems in the future. Reported-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-23KVM: remove a wrong hack of delivery PIT intr to vcpu0Yang Zhang
This hack is wrong. The pin number of PIT is connected to 2 not 0. This means this hack never takes effect. So it is ok to remove it. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-13KVM: struct kvm_memory_slot.id -> shortAlex Williamson
We're currently offering a whopping 32 memory slots to user space, an int is a bit excessive for storing this. We would like to increase our memslots, but SHRT_MAX should be more than enough. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13KVM: struct kvm_memory_slot.user_alloc -> boolAlex Williamson
There's no need for this to be an int, it holds a boolean. Move to the end of the struct for alignment. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTSAlex Williamson
It's easy to confuse KVM_MEMORY_SLOTS and KVM_MEM_SLOTS_NUM. One is the user accessible slots and the other is user + private. Make this more obvious. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13KVM: Minor memory slot optimizationAlex Williamson
If a slot is removed or moved in the guest physical address space, we first allocate and install a new slot array with the invalidated entry. The old array is then freed. We then proceed to allocate yet another slot array to install the permanent replacement. Re-use the original array when this occurs and avoid the extra kfree/kmalloc. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13KVM: Fix iommu map/unmap to handle memory slot movesAlex Williamson
The iommu integration into memory slots expects memory slots to be added or removed and doesn't handle the move case. We can unmap slots from the iommu after we mark them invalid and map them before installing the final memslot array. Also re-order the kmemdup vs map so we don't leave iommu mappings if we get ENOMEM. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13KVM: Check userspace_addr when modifying a memory slotAlex Williamson
The API documents that only flags and guest physical memory space can be modified on an existing slot, but we don't enforce that the userspace address cannot be modified. Instead we just ignore it. This means that a user may think they've successfully moved both the guest and user addresses, when in fact only the guest address changed. Check and error instead. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-13KVM: Restrict non-existing slot state transitionsAlex Williamson
The API documentation states: When changing an existing slot, it may be moved in the guest physical memory space, or its flags may be modified. An "existing slot" requires a non-zero npages (memory_size). The only transition we should therefore allow for a non-existing slot should be to create the slot, which includes setting a non-zero memory_size. We currently allow calls to modify non-existing slots, which is pointless, confusing, and possibly wrong. With this we know that the invalidation path of __kvm_set_memory_region is always for a delete or move and never for adding a zero size slot. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-10kvm: Fix irqfd resampler list walkAlex Williamson
Typo for the next pointer means we're walking random data here. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-09Merge branch 'for-upstream' of https://github.com/agraf/linux-2.6 into queueMarcelo Tosatti
* 'for-upstream' of https://github.com/agraf/linux-2.6: (28 commits) KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation KVM: PPC: bookehv: Add guest computation mode for irq delivery KVM: PPC: Make EPCR a valid field for booke64 and bookehv KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation KVM: PPC: e500: Add emulation helper for getting instruction ea KVM: PPC: bookehv64: Add support for interrupt handling KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler KVM: PPC: booke: Fix get_tb() compile error on 64-bit KVM: PPC: e500: Silence bogus GCC warning in tlb code KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations MAINTAINERS: Add git tree link for PPC KVM KVM: PPC: Book3S PR: MSR_DE doesn't exist on Book 3S KVM: PPC: Book3S PR: Fix VSX handling KVM: PPC: Book3S PR: Emulate PURR, SPURR and DSCR registers KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages KVM: PPC: Book3S HV: Report correct HPT entry index when reading HPT ...
2012-12-06KVM: Distangle eventfd code from irqchipAlexander Graf
The current eventfd code assumes that when we have eventfd, we also have irqfd for in-kernel interrupt delivery. This is not necessarily true. On PPC we don't have an in-kernel irqchip yet, but we can still support easily support eventfd. Signed-off-by: Alexander Graf <agraf@suse.de>
2012-12-05kvm: deliver msi interrupts from irq handlerMichael S. Tsirkin
We can deliver certain interrupts, notably MSI, from atomic context. Use kvm_set_irq_inatomic, to implement an irq handler for msi. This reduces the pressure on scheduler in case where host and guest irq share a host cpu. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-05kvm: add kvm_set_irq_inatomicMichael S. Tsirkin
Add an API to inject IRQ from atomic context. Return EWOULDBLOCK if impossible (e.g. for multicast). Only MSI is supported ATM. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-11-29KVM: Fix user memslot overlap checkAlex Williamson
Prior to memory slot sorting this loop compared all of the user memory slots for overlap with new entries. With memory slot sorting, we're just checking some number of entries in the array that may or may not be user slots. Instead, walk all the slots with kvm_for_each_memslot, which has the added benefit of terminating early when we hit the first empty slot, and skip comparison to private slots. Cc: stable@vger.kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initializationMarcelo Tosatti
TSC initialization will soon make use of online_vcpus. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flagMarcelo Tosatti
KVM added a global variable to guarantee monotonicity in the guest. One of the reasons for that is that the time between 1. ktime_get_ts(&timespec); 2. rdtscll(tsc); Is variable. That is, given a host with stable TSC, suppose that two VCPUs read the same time via ktime_get_ts() above. The time required to execute 2. is not the same on those two instances executing in different VCPUS (cache misses, interrupts...). If the TSC value that is used by the host to interpolate when calculating the monotonic time is the same value used to calculate the tsc_timestamp value stored in the pvclock data structure, and a single <system_timestamp, tsc_timestamp> tuple is visible to all vcpus simultaneously, this problem disappears. See comment on top of pvclock_update_vm_gtod_copy for details. Monotonicity is then guaranteed by synchronicity of the host TSCs and guest TSCs. Set TSC stable pvclock flag in that case, allowing the guest to read clock from userspace. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-13KVM: remove unnecessary return value checkGuo Chao
No need to check return value before breaking switch. Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-13KVM: do not kfree error pointerGuo Chao
We should avoid kfree()ing error pointer in kvm_vcpu_ioctl() and kvm_arch_vcpu_ioctl(). Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-29KVM: do not treat noslot pfn as a error pfnXiao Guangrong
This patch filters noslot pfn out from error pfns based on Marcelo comment: noslot pfn is not a error pfn After this patch, - is_noslot_pfn indicates that the gfn is not in slot - is_error_pfn indicates that the gfn is in slot but the error is occurred when translate the gfn to pfn - is_error_noslot_pfn indicates that the pfn either it is error pfns or it is noslot pfn And is_invalid_pfn can be removed, it makes the code more clean Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-29Merge remote-tracking branch 'master' into queueMarcelo Tosatti
Merge reason: development work has dependency on kvm patches merged upstream. Conflicts: arch/powerpc/include/asm/Kbuild arch/powerpc/include/asm/kvm_para.h Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-24Merge tag 'kvm-3.7-2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Avi Kivity: "KVM updates for 3.7-rc2" * tag 'kvm-3.7-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM guest: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT KVM: apic: fix LDR calculation in x2apic mode KVM: MMU: fix release noslot pfn
2012-10-22KVM: MMU: fix release noslot pfnXiao Guangrong
We can not directly call kvm_release_pfn_clean to release the pfn since we can meet noslot pfn which is used to cache mmio info into spte Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
2012-10-10KVM: change kvm_assign_device() to print return value when ↵Shuah Khan
iommu_attach_device() fails Change existing kernel error message to include return value from iommu_attach_device() when it fails. This will help debug device assignment failures more effectively. Signed-off-by: Shuah Khan <shuah.khan@hp.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-10-06kvm: replace test_and_set_bit_le() in mark_page_dirty_in_slot() with ↵Takuya Yoshikawa
set_bit_le() Now that we have defined generic set_bit_le() we do not need to use test_and_set_bit_le() for atomically setting a bit. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-04Merge tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Avi Kivity: "Highlights of the changes for this release include support for vfio level triggered interrupts, improved big real mode support on older Intels, a streamlines guest page table walker, guest APIC speedups, PIO optimizations, better overcommit handling, and read-only memory." * tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits) KVM: s390: Fix vcpu_load handling in interrupt code KVM: x86: Fix guest debug across vcpu INIT reset KVM: Add resampling irqfds for level triggered interrupts KVM: optimize apic interrupt delivery KVM: MMU: Eliminate pointless temporary 'ac' KVM: MMU: Avoid access/dirty update loop if all is well KVM: MMU: Eliminate eperm temporary KVM: MMU: Optimize is_last_gpte() KVM: MMU: Simplify walk_addr_generic() loop KVM: MMU: Optimize pte permission checks KVM: MMU: Update accessed and dirty bits after guest pagetable walk KVM: MMU: Move gpte_access() out of paging_tmpl.h KVM: MMU: Optimize gpte_access() slightly KVM: MMU: Push clean gpte write protection out of gpte_access() KVM: clarify kvmclock documentation KVM: make processes waiting on vcpu mutex killable KVM: SVM: Make use of asm.h KVM: VMX: Make use of asm.h KVM: VMX: Make lto-friendly KVM: x86: lapic: Clean up find_highest_vector() and count_vectors() ... Conflicts: arch/s390/include/asm/processor.h arch/x86/kvm/i8259.c
2012-10-02Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...
2012-09-23KVM: Add resampling irqfds for level triggered interruptsAlex Williamson
To emulate level triggered interrupts, add a resample option to KVM_IRQFD. When specified, a new resamplefd is provided that notifies the user when the irqchip has been resampled by the VM. This may, for instance, indicate an EOI. Also in this mode, posting of an interrupt through an irqfd only asserts the interrupt. On resampling, the interrupt is automatically de-asserted prior to user notification. This enables level triggered interrupts to be posted and re-enabled from vfio with no userspace intervention. All resampling irqfds can make use of a single irq source ID, so we reserve a new one for this interface. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-20KVM: optimize apic interrupt deliveryGleb Natapov
Most interrupt are delivered to only one vcpu. Use pre-build tables to find interrupt destination instead of looping through all vcpus. In case of logical mode loop only through vcpus in a logical cluster irq is sent to. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-17KVM: make processes waiting on vcpu mutex killableMichael S. Tsirkin
vcpu mutex can be held for unlimited time so taking it with mutex_lock on an ioctl is wrong: one process could be passed a vcpu fd and call this ioctl on the vcpu used by another process, it will then be unkillable until the owner exits. Call mutex_lock_killable instead and return status. Note: mutex_lock_interruptible would be even nicer, but I am not sure all users are prepared to handle EINTR from these ioctls. They might misinterpret it as an error. Cleanup paths expect a vcpu that can't be used by any userspace so this will always succeed - catch bugs by calling BUG_ON. Catch callers that don't check return state by adding __must_check. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-09-06KVM: move postcommit flush to x86, as mmio sptes are x86 specificMarcelo Tosatti
Other arches do not need this. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> v2: fix incorrect deletion of mmio sptes on gpa move (noticed by Takuya) Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-06KVM: perform an invalid memslot step for gpa base changeMarcelo Tosatti
PPC must flush all translations before the new memory slot is visible. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-09-06KVM: split kvm_arch_flush_shadowMarcelo Tosatti
Introducing kvm_arch_flush_shadow_memslot, to invalidate the translations of a single memory slot. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-08-27KVM: PPC: book3s: fix build error caused by gfn_to_hva_memslot()Gavin Shan
The build error was caused by that builtin functions are calling the functions implemented in modules. This error was introduced by commit 4d8b81abc4 ("KVM: introduce readonly memslot"). The patch fixes the build error by moving function __gfn_to_hva_memslot() from kvm_main.c to kvm_host.h and making that "inline" so that the builtin function (kvmppc_h_enter) can use that. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-26kvm: Fix nonsense handling of compat ioctlAlan Cox
KVM_SET_SIGNAL_MASK passed a NULL argument leaves the on stack signal sets uninitialized. It then passes them through to kvm_vcpu_ioctl_set_sigmask. We should be passing a NULL in this case not translated garbage. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-08-22KVM: introduce readonly memslotXiao Guangrong
In current code, if we map a readonly memory space from host to guest and the page is not currently mapped in the host, we will get a fault pfn and async is not allowed, then the vm will crash We introduce readonly memory region to map ROM/ROMD to the guest, read access is happy for readonly memslot, write access on readonly memslot will cause KVM_EXIT_MMIO exit Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>