linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2009-10-05	powerpc: Fix incorrect setting of __HAVE_ARCH_PTE_SPECIAL	Weirich, Bernhard
	[I'm going to fix upstream differently, by having all CPU types actually support _PAGE_SPECIAL, but I prefer the simple and obvious fix for -stable. -- Ben] The test that decides whether to define __HAVE_ARCH_PTE_SPECIAL on powerpc is bogus and will end up always defining it, even when _PAGE_SPECIAL is not supported (in which case it's 0) such as on 8xx or 40x processors. Signed-off-by: Bernhard Weirich <bernhard.weirich@riedel.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05	powerpc/8xx: Fix regression introduced by cache coherency rewrite	Rex Feany
	commit e0908085fc2391c85b85fb814ae1df377c8e0dcb upstream. After upgrading to the latest kernel on my mpc875 userspace started running incredibly slow (hours to get to a shell, even!). I tracked it down to commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36, that patch removed a work-around for the 8xx. Adding it back makes my problem go away. Signed-off-by: Rex Feany <rfeany@mrv.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05	Fix NULL ptr regression in powernow-k8	Kurt Roeckx
	commit f0adb134d8dc9993a9998dc50845ec4f6ff4fadc upstream. Fixes bugzilla #13780 From: Kurt Roeckx <kurt@roeckx.be> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05	Revert "KVM: x86: check for cr3 validity in ioctl_set_sregs"	Marcelo Tosatti
	(cherry picked from commit dc7e795e3dd2a763e5ceaa1615f307e808cf3932) This reverts commit 6c20e1442bb1c62914bb85b7f4a38973d2a423ba. To my understanding, it became obsolete with the advent of the more robust check in mmu_alloc_roots (89da4ff17f). Moreover, it prevents the conceptually safe pattern 1. set sregs 2. register mem-slots 3. run vcpu by setting a sticky triple fault during step 1. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05	KVM: fix cpuid E2BIG handling for extended request types	Mark McLoughlin
	(cherry picked from commit cb007648de83cf226d69ec76e1c01848b4e8e49f) If we run out of cpuid entries for extended request types we should return -E2BIG, just like we do for the standard request types. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05	KVM guest: fix bogus wallclock physical address calculation	Glauber Costa
	(cherry picked from commit a20316d2aa41a8f4fd171648bad8f044f6060826) The use of __pa() to calculate the address of a C-visible symbol is wrong, and can lead to unpredictable results. See arch/x86/include/asm/page.h for details. It should be replaced with __pa_symbol(), that does the correct math here, by taking relocations into account. This ensures the correct wallclock data structure physical address is passed to the hypervisor. Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05	KVM: limit lapic periodic timer frequency	Marcelo Tosatti
	(cherry picked from commit 1444885a045fe3b1905a14ea1b52540bf556578b) Otherwise its possible to starve the host by programming lapic timer with a very high frequency. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05	KVM: MMU: fix bogus alloc_mmu_pages assignment	Marcelo Tosatti
	(cherry picked from commit b90c062c65cc8839edfac39778a37a55ca9bda36) Remove the bogus n_free_mmu_pages assignment from alloc_mmu_pages. It breaks accounting of mmu pages, since n_free_mmu_pages is modified but the real number of pages remains the same. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05	KVM: MMU: fix missing locking in alloc_mmu_pages	Marcelo Tosatti
	(cherry picked from commit 6a1ac77110ee3e8d8dfdef8442f3b30b3d83e6a2) n_requested_mmu_pages/n_free_mmu_pages are used by kvm_mmu_change_mmu_pages to calculate the number of pages to zap. alloc_mmu_pages, called from the vcpu initialization path, modifies this variables without proper locking, which can result in a negative value in kvm_mmu_change_mmu_pages (say, with cpu hotplug). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05	KVM: x86: Disallow hypercalls for guest callers in rings > 0	Jan Kiszka
	(cherry picked from commit 07708c4af1346ab1521b26a202f438366b7bcffd) So far unprivileged guest callers running in ring 3 can issue, e.g., MMU hypercalls. Normally, such callers cannot provide any hand-crafted MMU command structure as it has to be passed by its physical address, but they can still crash the guest kernel by passing random addresses. To close the hole, this patch considers hypercalls valid only if issued from guest ring 0. This may still be relaxed on a per-hypercall base in the future once required. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05	KVM: MMU: make __kvm_mmu_free_some_pages handle empty list	Izik Eidus
	(cherry picked from commit 3b80fffe2b31fb716d3ebe729c54464ee7856723) First check if the list is empty before attempting to look at list entries. Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05	KVM: VMX: Fix cr8 exiting control clobbering by EPT	Gleb Natapov
	(cherry picked from commit 5fff7d270bd6a4759b6d663741b729cdee370257) Don't call adjust_vmx_controls() two times for the same control. It restores options that were dropped earlier. This loses us the cr8 exit control, which causes a massive performance regression Windows x64. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-10-05	KVM: VMX: Check cpl before emulating debug register access	Avi Kivity
	(cherry picked from commit 0a79b009525b160081d75cef5dbf45817956acf2) Debug registers may only be accessed from cpl 0. Unfortunately, vmx will code to emulate the instruction even though it was issued from guest userspace, possibly leading to an unexpected trap later. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24	powerpc/pseries: Fix to handle slb resize across migration	Brian King
	commit 46db2f86a3b2a94e0b33e0b4548fb7b7b6bdff66 upstream. The SLB can change sizes across a live migration, which was not being handled, resulting in possible machine crashes during migration if migrating to a machine which has a smaller max SLB size than the source machine. Fix this by first reducing the SLB size to the minimum possible value, which is 32, prior to migration. Then during the device tree update which occurs after migration, we make the call to ensure the SLB gets updated. Also add the slb_size to the lparcfg output so that the migration tools can check to make sure the kernel has this capability before allowing migration in scenarios where the SLB size will change. BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid breaking ppc32 build Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24	x86, pat: Fix cacheflush address in change_page_attr_set_clr()	Jack Steiner
	commit fa526d0d641b5365676a1fb821ce359e217c9b85 upstream. Fix address passed to cpa_flush_range() when changing page attributes from WB to UC. The address (*addr) is modified by __change_page_attr_set_clr(). The result is that the pages being flushed start at the _end_ of the changed range instead of the beginning. This should be considered for 2.6.30-stable and 2.6.31-stable. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24	x86/i386: Make sure stack-protector segment base is cache aligned	Jeremy Fitzhardinge
	commit 1ea0d14e480c245683927eecc03a70faf06e80c8 upstream. The Intel Optimization Reference Guide says: In Intel Atom microarchitecture, the address generation unit assumes that the segment base will be 0 by default. Non-zero segment base will cause load and store operations to experience a delay. - If the segment base isn't aligned to a cache line boundary, the max throughput of memory operations is reduced to one [e]very 9 cycles. [...] Assembly/Compiler Coding Rule 15. (H impact, ML generality) For Intel Atom processors, use segments with base set to 0 whenever possible; avoid non-zero segment base address that is not aligned to cache line boundary at all cost. We can't avoid having a non-zero base for the stack-protector segment, but we can make it cache-aligned. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> LKML-Reference: <4AA01893.6000507@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24	x86: Fix x86_model test in es7000_apic_is_cluster()	Roel Kluin
	commit 005155b1f626d2b2d7932e4afdf4fead168c6888 upstream. For the x86_model to be greater than 6 or less than 12 is logically always true. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-24	ARM: 5691/1: fix cache aliasing issues between kmap() and kmap_atomic() with ↵	Nicolas Pitre
	highmem commit 7929eb9cf643ae416e5081b2a6fa558d37b9854c upstream. Let's suppose a highmem page is kmap'd with kmap(). A pkmap entry is used, the page mapped to it, and the virtual cache is dirtied. Then kunmap() is used which does virtually nothing except for decrementing a usage count. Then, let's suppose the _same_ page gets mapped using kmap_atomic(). It is therefore mapped onto a fixmap entry instead, which has a different virtual address unaware of the dirty cache data for that page sitting in the pkmap mapping. Fortunately it is easy to know if a pkmap mapping still exists for that page and use it directly with kmap_atomic(), thanks to kmap_high_get(). And actual testing with a printk in the added code path shows that this condition is actually met extremely frequently. Seems that we've been quite lucky that things have worked so well with highmem so far. Signed-off-by: Nicolas Pitre <nico@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-15	sparc: sys32.S incorrect compat-layer splice() system call	Mathieu Desnoyers
	[ Upstream commit e2c6cbd9ace61039d3de39e717195e38f1492aee ] I think arch/sparc/kernel/sys32.S has an incorrect splice definition: SIGN2(sys32_splice, sys_splice, %o0, %o1) The splice() prototype looks like : long splice(int fd_in, loff_t off_in, int fd_out, loff_t off_out, size_t len, unsigned int flags); So I think we should have : SIGN2(sys32_splice, sys_splice, %o0, %o2) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-15	sparc64: Fix bootup with mcount in some configs.	David S. Miller
	[ Upstream commit bd4352cadfacb9084c97c853b025fac010266c26 ] Functions invoked early when booting up a cpu can't use tracing because mcount requires a valid 'current_thread_info()' and TLB mappings to be setup. The code path of sun4v_register_mondo_queues --> register_one_mondo is one such case. sun4v_register_mondo_queues already has the necessary 'notrace' annotation, but register_one_mondo does not. Normally register_one_mondo is inlined so the bug doesn't trigger, but with some config/compiler combinations, it won't be so we must properly mark it notrace. While we're here, add 'notrace' annoations to prom_printf and prom_halt so that early error handling won't have the same problem. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Reported-by: Leif Sawyer <lsawyer@gci.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-15	sparc64: Validate linear D-TLB misses.	David S. Miller
	[ Upstream commit d8ed1d43e17898761c7221014a15a4c7501d2ff3 ] When page alloc debugging is not enabled, we essentially accept any virtual address for linear kernel TLB misses. But with kgdb, kernel address probing, and other facilities we can try to access arbitrary crap. So, make sure the address we miss on will translate to physical memory that actually exists. In order to make this work we have to embed the valid address bitmap into the kernel image. And in order to make that less expensive we make an adjustment, in that the max physical memory address is decreased to "1 << 41", even on the chips that support a 42-bit physical address space. We can do this because bit 41 indicates "I/O space" and thus covers non-memory ranges. The result of this is that: 1) kpte_linear_bitmap shrinks from 2K to 1K in size 2) we need 64K more for the valid address bitmap We can't let the valid address bitmap be dynamically allocated once we start using it to validate TLB misses, otherwise we have crazy issues to deal with wrt. recursive TLB misses and such. If we're in a TLB miss it could be the deepest trap level that's legal inside of the cpu. So if we TLB miss referencing the bitmap, the cpu will be out of trap levels and enter RED state. To guard against out-of-range accesses to the bitmap, we have to check to make sure no bits in the physical address above bit 40 are set. We could export and use last_valid_pfn for this check, but that's just an unnecessary extra memory reference. On the plus side of all this, since we load all of these translations into the special 4MB mapping TSB, and we check the TSB first for TLB misses, there should be absolutely no real cost for these new checks in the TLB miss path. Reported-by: heyongli@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-15	sparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds.	David S. Miller
	[ Upstream commit e6617c6ec28a17cf2f90262b835ec05b9b861400 ] This is a compromise and a temporary workaround for bootup NMI watchdog triggers some people see with qla2xxx devices present. This happens when, for example: CPU 0 is in the driver init and looping submitting mailbox commands to load the firmware, then waiting for completion. CPU 1 is receiving the device interrupts. CPU 1 is where the NMI watchdog triggers. CPU 0 is submitting mailbox commands fast enough that by the time CPU 1 returns from the device interrupt handler, a new one is pending. This sequence runs for more than 5 seconds. The problematic case is CPU 1's timer interrupt running when the barrage of device interrupts begin. Then we have: timer interrupt return for softirq checking pending, thus enable interrupts qla2xxx interrupt return qla2xxx interrupt return ... 5+ seconds pass final qla2xxx interrupt for fw load return run timer softirq return At some point in the multi-second qla2xxx interrupt storm we trigger the NMI watchdog on CPU 1 from the NMI interrupt handler. The timer softirq, once we get back to running it, is smart enough to run the timer work enough times to make up for the missed timer interrupts. However, the NMI watchdogs (both x86 and sparc) use the timer interrupt count to notice the cpu is wedged. But in the above scenerio we'll receive only one such timer interrupt even if we last all the way back to running the timer softirq. The default watchdog trigger point is only 5 seconds, which is pretty low (the softwatchdog triggers at 60 seconds). So increase it to 30 seconds for now. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	powerpc/ps3: Add missing check for PS3 to rtc-ps3 platform device registration	Geert Uytterhoeven
	commit 7b6a09f3d6aedeaac923824af2a5df30300b56e9 upstream. On non-PS3, we get: \| kernel BUG at drivers/rtc/rtc-ps3.c:36! because the rtc-ps3 platform device is registered unconditionally in a kernel with builtin support for PS3. Reported-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	x86, amd: Don't probe for extended APIC ID if APICs are disabled	Jeremy Fitzhardinge
	commit 2cb078603abb612e3bcd428fb8122c3d39e08832 upstream. If we've logically disabled apics, don't probe the PCI space for the AMD extended APIC ID. [ Impact: prevent boot crash under Xen. ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reported-by: Bastian Blank <bastian@waldi.eu.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	Bug Fix arch/ia64/kernel/pci-dma.c: fix recursive dma_supported() call in ↵	Fenghua Yu
	iommu_dma_supported() commit 51b89f7a6615eca184aa0b85db5781d931e9c8d1 upstream. In commit 160c1d8e40866edfeae7d68816b7005d70acf391, dma_ops->dma_supported = iommu_dma_supported; This dma_ops->dma_supported is first called in platform_dma_init() during kernel boot. Then dma_ops->dma_supported will be called recursively in iommu_dma_supported. Kernel can not boot because kernel can not get out of iommu_dma_supported until it runs out of stack memory. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Fix KVM_GET_MSR_INDEX_LIST	Jan Kiszka
	commit e125e7b6944898831b56739a5448e705578bf7e2 upstream. So far, KVM copied the emulated_msrs (only MSR_IA32_MISC_ENABLE) to a wrong address in user space due to broken pointer arithmetic. This caused subtle corruption up there (missing MSR_IA32_MISC_ENABLE had probably no practical relevance). Moreover, the size check for the user-provided kvm_msr_list forgot about emulated MSRs. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: MMU: limit rmap chain length	Marcelo Tosatti
	(cherry picked from commit 53a27b39ff4d2492f84b1fdc2f0047175f0b0b93) Otherwise the host can spend too long traversing an rmap chain, which happens under a spinlock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: MMU: handle n_free_mmu_pages > n_alloc_mmu_pages in ↵	Marcelo Tosatti
	kvm_mmu_change_mmu_pages (cherry picked from commit 025dbbf36a7680bffe54d9dcbf0a8bc01a7cbd10) kvm_mmu_change_mmu_pages mishandles the case where n_alloc_mmu_pages is smaller then n_free_mmu_pages, by not checking if the result of the subtraction is negative. Its a valid condition which can happen if a large number of pages has been recently freed. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: SVM: force new asid on vcpu migration	Marcelo Tosatti
	(cherry picked from commit 4b656b1202498184a0ecef86b3b89ff613b9c6ab) If a migrated vcpu matches the asid_generation value of the target pcpu, there will be no TLB flush via TLB_CONTROL_FLUSH_ALL_ASID. The check for vcpu.cpu in pre_svm_run is meaningless since svm_vcpu_load already updated it on schedule in. Such vcpu will VMRUN with stale TLB entries. Based on original patch from Joerg Roedel (http://patchwork.kernel.org/patch/10021/) Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: x86: verify MTRR/PAT validity	Marcelo Tosatti
	(cherry picked from commit d6289b9365c3f622a8cfe62c4fb054bb70b5061a) Do not allow invalid memory types in MTRR/PAT (generating a #GP otherwise). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Fix cpuid feature misreporting	Avi Kivity
	(cherry picked from commit 8d753f369bd28fff1706ffe9fb9fea4fd88cf85b) MTRR, PAT, MCE, and MCA are all supported (to some extent) but not reported. Vista requires these features, so if userspace relies on kernel cpuid reporting, it loses support for Vista. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Ignore reads to K7 EVNTSEL MSRs	Amit Shah
	(cherry picked from commit 9e6996240afcbe61682eab8eeaeb65c34333164d) In commit 7fe29e0faacb650d31b9e9f538203a157bec821d we ignored the reads to the P6 EVNTSEL MSRs. That fixed crashes on Intel machines. Ignore the reads to K7 EVNTSEL MSRs as well to fix this on AMD hosts. This fixes Kaspersky antivirus crashing Windows guests on AMD hosts. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: x86: Ignore reads to EVNTSEL MSRs	Amit Shah
	(cherry picked from commit 7fe29e0faacb650d31b9e9f538203a157bec821d) We ignore writes to the performance counters and performance event selector registers already. Kaspersky antivirus reads the eventsel MSR causing it to crash with the current behaviour. Return 0 as data when the eventsel registers are read to stop the crash. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: MMU: Use different shadows when EFER.NXE changes	Avi Kivity
	(cherry picked from commit 9645bb56b31a1b70ab9e470387b5264cafc04aa9) A pte that is shadowed when the guest EFER.NXE=1 is not valid when EFER.NXE=0; if bit 63 is set, the pte should cause a fault, and since the shadow EFER always has NX enabled, this won't happen. Fix by using a different shadow page table for different EFER.NXE bits. This allows vcpus to run correctly with different values of EFER.NXE, and for transitions on this bit to be handled correctly without requiring a full flush. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Deal with interrupt shadow state for emulated instructions	Glauber Costa
	(cherry picked from commit 310b5d306c1aee7ebe32f702c0e33e7988d50646) We currently unblock shadow interrupt state when we skip an instruction, but failing to do so when we actually emulate one. This blocks interrupts in key instruction blocks, in particular sti; hlt; sequences If the instruction emulated is an sti, we have to block shadow interrupts. The same goes for mov ss. pop ss also needs it, but we don't currently emulate it. Without this patch, I cannot boot gpxe option roms at vmx machines. This is described at https://bugzilla.redhat.com/show_bug.cgi?id=494469 Signed-off-by: Glauber Costa <glommer@redhat.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Introduce {set/get}_interrupt_shadow()	Glauber Costa
	This patch introduces set/get_interrupt_shadow(), that does exactly what the name suggests. It also replaces open code that explicitly does it with the now existent functions. It differs slightly from upstream, because upstream merged it after gleb's interrupt rework, that we don't ship. Just for reference, upstream changelog is (2809f5d2c4cfad171167b131bb2a21ab65eba40f): This patch replaces drop_interrupt_shadow with the more general set_interrupt_shadow, that can either drop or raise it, depending on its parameter. It also adds ->get_interrupt_shadow() for future use. Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: MMU: do not free active mmu pages in free_mmu_pages()	Gleb Natapov
	(cherry picked from commit f00be0cae4e6ad0a8c7be381c6d9be3586800b3e) free_mmu_pages() should only undo what alloc_mmu_pages() does. Free mmu pages from the generic VM destruction function, kvm_destroy_vm(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: MMU: protect kvm_mmu_change_mmu_pages with mmu_lock	Marcelo Tosatti
	(cherry picked from commit 7c8a83b75a38a807d37f5a4398eca2a42c8cf513) kvm_handle_hva, called by MMU notifiers, manipulates mmu data only with the protection of mmu_lock. Update kvm_mmu_change_mmu_pages callers to take mmu_lock, thus protecting against kvm_handle_hva. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: x86: check for cr3 validity in mmu_alloc_roots	Marcelo Tosatti
	(cherry picked from commit 8986ecc0ef58c96eec48d8502c048f3ab67fd8e2) Verify the cr3 address stored in vcpu->arch.cr3 points to an existant memslot. If not, inject a triple fault. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	x86: don't call '->send_IPI_mask()' with an empty mask	Linus Torvalds
	commit b04e6373d694e977c95ae0ae000e2c1e2cf92d73 upstream. As noted in 83d349f35e1ae72268c5104dbf9ab2ae635425d4 ("x86: don't send an IPI to the empty set of CPU's"), some APIC's will be very unhappy with an empty destination mask. That commit added a WARN_ON() for that case, and avoided the resulting problem, but didn't fix the underlying reason for why those empty mask cases happened. This fixes that, by checking the result of 'cpumask_andnot()' of the current CPU actually has any other CPU's left in the set of CPU's to be sent a TLB flush, and not calling down to the IPI code if the mask is empty. The reason this started happening at all is that we started passing just the CPU mask pointers around in commit 4595f9620 ("x86: change flush_tlb_others to take a const struct cpumask"), and when we did that, the cpumask was no longer thread-local. Before that commit, flush_tlb_mm() used to create it's own copy of 'mm->cpu_vm_mask' and pass that copy down to the low-level flush routines after having tested that it was not empty. But after changing it to just pass down the CPU mask pointer, the lower level TLB flush routines would now get a pointer to that 'mm->cpu_vm_mask', and that could still change - and become empty - after the test due to other CPU's having flushed their own TLB's. See http://bugzilla.kernel.org/show_bug.cgi?id=13933 for details. Tested-by: Thomas Björnell <thomas.bjornell@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	x86: don't send an IPI to the empty set of CPU's	Linus Torvalds
	commit 83d349f35e1ae72268c5104dbf9ab2ae635425d4 upstream. The default_send_IPI_mask_logical() function uses the "flat" APIC mode to send an IPI to a set of CPU's at once, but if that set happens to be empty, some older local APIC's will apparently be rather unhappy. So just warn if a caller gives us an empty mask, and ignore it. This fixes a regression in 2.6.30.x, due to commit 4595f9620 ("x86: change flush_tlb_others to take a const struct cpumask"), documented here: http://bugzilla.kernel.org/show_bug.cgi?id=13933 which causes a silent lock-up. It only seems to happen on PPro, P2, P3 and Athlon XP cores. Most developers sadly (or not so sadly, if you're a developer..) have more modern CPU's. Also, on x86-64 we don't use the flat APIC mode, so it would never trigger there even if the APIC didn't like sending an empty IPI mask. Reported-by: Pavel Vilim <wylda@volny.cz> Reported-and-tested-by: Thomas Björnell <thomas.bjornell@gmail.com> Reported-and-tested-by: Martin Rogge <marogge@onlinehome.de> Cc: Mike Travis <travis@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16	x86: Fix VMI && stack protector	Alok Kataria
	commit 7d5b005652bc5ae3e1e0efc53fd0e25a643ec506 upstream. With CONFIG_STACK_PROTECTOR turned on, VMI doesn't boot with more than one processor. The problem is with the gs value not being initialized correctly when registering the secondary processor for VMI's case. The patch below initializes the gs value for the AP to __KERNEL_STACK_CANARY. Without this the secondary processor keeps on taking a GP on every gs access. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1249425262.18955.40.camel@ank32.eng.vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16	x86, pat: Fix set_memory_wc related corruption	Pallipadi, Venkatesh
	commit bdc6340f4eb68295b1e7c0ade2356b56dca93d93 upstream. Changeset 3869c4aa18835c8c61b44bd0f3ace36e9d3b5bd0 that went in after 2.6.30-rc1 was a seemingly small change to _set_memory_wc() to make it complaint with SDM requirements. But, introduced a nasty bug, which can result in crash and/or strange corruptions when set_memory_wc is used. One such crash reported here http://lkml.org/lkml/2009/7/30/94 Actually, that changeset introduced two bugs. * change_page_attr_set() takes &addr as first argument and can the addr value might have changed on return, even for single page change_page_attr_set() call. That will make the second change_page_attr_set() in this routine operate on unrelated addr, that can eventually cause strange corruptions and bad page state crash. * The second change_page_attr_set() call, before setting _PAGE_CACHE_WC, should clear the earlier _PAGE_CACHE_UC_MINUS, as otherwise cache attribute will not be WC (will be UC instead). The patch below fixes both these problems. Sending a single patch to fix both the problems, as the change is to the same line of code. The change to have a addr_copy is not very clean. But, it is simpler than making more changes through various routines in pageattr.c. A huge thanks to Jerome for reporting this problem and providing a simple test case that helped us root cause the problem. Reported-by: Jerome Glisse <glisse@freedesktop.org> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20090730214319.GA1889@linux-os.sc.intel.com> Acked-by: Dave Airlie <airlied@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16	x86: fix assembly constraints in native_save_fl()	H. Peter Anvin
	commit f1f029c7bfbf4ee1918b90a431ab823bed812504 upstream. From Gabe Black in bugzilla 13888: native_save_fl is implemented as follows: 11static inline unsigned long native_save_fl(void) 12{ 13 unsigned long flags; 14 15 asm volatile("# __raw_save_flags\n\t" 16 "pushf ; pop %0" 17 : "=g" (flags) 18 : /* no input */ 19 : "memory"); 20 21 return flags; 22} If gcc chooses to put flags on the stack, for instance because this is inlined into a larger function with more register pressure, the offset of the flags variable from the stack pointer will change when the pushf is performed. gcc doesn't attempt to understand that fact, and address used for pop will still be the same. It will write to somewhere near flags on the stack but not actually into it and overwrite some other value. I saw this happen in the ide_device_add_all function when running in a simulator I work on. I'm assuming that some quirk of how the simulated hardware is set up caused the code path this is on to be executed when it normally wouldn't. A simple fix might be to change "=g" to "=r". Reported-by: Gabe Black <spamforgabe@umich.edu> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16	x86: Fix CPA memtype reserving in the set_pages_array*() cases	Thomas Hellstrom
	commit 8523acfe40efc1a8d3da8f473ca67cb195b06f0c upstream. The code was incorrectly reserving memtypes using the page virtual address instead of the physical address. Furthermore, the code was not ignoring highmem pages as it ought to. ( upstream does not pass in highmem pages yet - but upcoming graphics code will do it and there's no reason to not handle this properly in the CPA APIs.) Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=13884 Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: dri-devel@lists.sourceforge.net Cc: venkatesh.pallipadi@intel.com LKML-Reference: <1249284345-7654-1-git-send-email-thellstrom@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16	powerpc/mpc83xx: Fix usb mux setup for mpc834x	Peter Korsgaard
	commit b7d66c88c968379ebe683a28c4005895497ebbad upstream. usb0 and usb1 mux settings in the sicrl register were swapped (twice!) in mpc834x_usb_cfg(), leading to various strange issues with fsl-ehci and full speed devices. The USB port config on mpc834x is done using 2 muxes: Port 0 is always used for MPH port 0, and port 1 can either be used for MPH port 1 or DR (unless DR uses UTMI phy or OTG, then it uses both ports) - See 8349 RM figure 1-4.. mpc8349_usb_cfg() had this inverted for the DR, and it also had the bit positions of the usb0 / usb1 mux settings swapped. It would basically work if you specified port1 instead of port0 for the MPH controller (and happened to use ULPI phys), which is what all the 834x dts have done, even though that configuration is physically invalid. Instead fix mpc8349_usb_cfg() and adjust the dts files to match reality. Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-30	powerpc/mpic: Fix mapping of "DCR" based MPIC variants	Benjamin Herrenschmidt
	commit 5a2642f620eb6e40792822fa0eafe23046fbb55e upstream. Commit 31207dab7d2e63795eb15823947bd2f7025b08e2 "Fix incorrect allocation of interrupt rev-map" introduced a regression crashing on boot on machines using a "DCR" based MPIC, such as the Cell blades. The reason is that the irq host data structure is initialized much later as a result of that patch, causing our calls to mpic_map() do be done before we have a host setup. Unfortunately, this breaks _mpic_map_dcr() which uses the mpic->irqhost to get to the device node. This fixes it by, instead, passing the device node explicitely to mpic_map(). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Akira Tsukamoto <akirat@rd.scei.sony.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-30	x86: don't use 'access_ok()' as a range check in get_user_pages_fast()	Linus Torvalds
	[ Upstream commit 7f8189068726492950bf1a2dcfd9b51314560abf - modified for stable to not use the sloppy __VIRTUAL_MASK_SHIFT ] It's really not right to use 'access_ok()', since that is meant for the normal "get_user()" and "copy_from/to_user()" accesses, which are done through the TLB, rather than through the page tables. Why? access_ok() does both too few, and too many checks. Too many, because it is meant for regular kernel accesses that will not honor the 'user' bit in the page tables, and because it honors the USER_DS vs KERNEL_DS distinction that we shouldn't care about in GUP. And too few, because it doesn't do the 'canonical' check on the address on x86-64, since the TLB will do that for us. So instead of using a function that isn't meant for this, and does something else and much more complicated, just do the real rules: we don't want the range to overflow, and on x86-64, we want it to be a canonical low address (on 32-bit, all addresses are canonical). Acked-by: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-30	x86, setup (2.6.30-stable) fix 80x34 and 80x60 console modes	Marc Aurele La France
	Note: this is not in upstream since upstream is not affected due to the new "BIOS glovebox" subsystem. As coded, most INT10 calls in video-vga.c allow the compiler to assume EAX remains unchanged across them, which is not always the case. This triggers an optimisation issue that causes vga_set_vertical_end() to be called with an incorrect number of scanlines. Fix this by beefing up the asm constraints on these calls. Reported-by: Marc Aurele La France <tsi@xfree86.org> Signed-off-by: Marc Aurele La France <tsi@xfree86.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-30	parisc: fix ldcw inline assembler	Helge Deller
	commit 7d17e2763129ea307702fcdc91f6e9d114b65c2d upstream. There are two reasons to expose the memory a in the asm: 1) To prevent the compiler from discarding a preceeding write to a, and 2) to prevent it from caching *a in a register over the asm. The change has had a few days testing with a SMP build of 2.6.22.19 running on a rp3440. This patch is about the correctness of the __ldcw() macro itself. The use of the macro should be confined to small inline functions to try to limit the effect of clobbering memory on GCC's optimization of loads and stores. Signed-off-by: Dave Anglin <dave.anglin@nrc-cnrc.gc.ca> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>