linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2012-12-06	fix user-triggerable panic on parisc	Al Viro
	commit 441a179dafc0f99fc8b3a8268eef66958621082e upstream. int sys32_rt_sigprocmask(int how, compat_sigset_t __user set, compat_sigset_t __user oset, unsigned int sigsetsize) { sigset_t old_set, new_set; int ret; if (set && get_sigset32(set, &new_set, sigsetsize)) ... static int get_sigset32(compat_sigset_t __user up, sigset_t set, size_t sz) { compat_sigset_t s; int r; if (sz != sizeof set) panic("put_sigset32()"); In other words, rt_sigprocmask(69, (void )69, 69) done by 32bit process will promptly panic the box. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	ARM: Kirkwood: Update PCI-E fixup	Jason Gunthorpe
	commit 1dc831bf53fddcc6443f74a39e72db5bcea4f15d upstream. - The code relies on rc_pci_fixup being called, which only happens when CONFIG_PCI_QUIRKS is enabled, so add that to Kconfig. Omitting this causes a booting failure with a non-obvious cause. - Update rc_pci_fixup to set the class properly, copying the more modern style from other places - Correct the rc_pci_fixup comment Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Jason Cooper <jason@lakedaemon.net> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	Dove: Fix irq_to_pmu()	Russell King - ARM Linux
	commit d356cf5a74afa32b40decca3c9dd88bc3cd63eb5 upstream. PMU interrupts start at IRQ_DOVE_PMU_START, not IRQ_DOVE_PMU_START + 1. Fix the condition. (It may have been less likely to occur had the code been written "if (irq >= IRQ_DOVE_PMU_START" which imho is the easier to understand notation, and matches the normal way of thinking about these things.) Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	Dove: Attempt to fix PMU/RTC interrupts	Russell King - ARM Linux
	commit 5d3df935426271016b895aecaa247101b4bfa35e upstream. Fix the acknowledgement of PMU interrupts on Dove: some Dove hardware has not been sensibly designed so that interrupts can be handled in a race free manner. The PMU is one such instance. The pending (aka 'cause') register is a bunch of RW bits, meaning that these bits can be both cleared and set by software (confirmed on the Armada-510 on the cubox.) Hardware sets the appropriate bit when an interrupt is asserted, and software is required to clear the bits which are to be processed. If we write ~(1 << bit), then we end up asserting every other interrupt except the one we're processing. So, we need to do a read-modify-write cycle to clear the asserted bit. However, any interrupts which occur in the middle of this cycle will also be written back as zero, which will also clear the new interrupts. The upshot of this is: there is _no_ way to safely clear down interrupts in this register (and other similarly behaving interrupt pending registers on this device.) The patch below at least stops us creating new interrupts. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	x86, microcode, AMD: Add support for family 16h processors	Boris Ostrovsky
	commit 36c46ca4f322a7bf89aad5462a3a1f61713edce7 upstream. Add valid patch size for family 16h processors. [ hpa: promoting to urgent/stable since it is hw enabling and trivial ] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Acked-by: Andreas Herrmann <herrmann.der.user@googlemail.com> Link: http://lkml.kernel.org/r/1353004910-2204-1-git-send-email-boris.ostrovsky@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	x86-32: Export kernel_stack_pointer() for modules	H. Peter Anvin
	commit cb57a2b4cff7edf2a4e32c0163200e9434807e0a upstream. Modules, in particular oprofile (and possibly other similar tools) need kernel_stack_pointer(), so export it using EXPORT_SYMBOL_GPL(). Cc: Yang Wei <wei.yang@windriver.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Jun Zhang <jun.zhang@intel.com> Link: http://lkml.kernel.org/r/20120912135059.GZ8285@erda.amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	x86-32: Fix invalid stack address while in softirq	Robert Richter
	commit 1022623842cb72ee4d0dbf02f6937f38c92c3f41 upstream. In 32 bit the stack address provided by kernel_stack_pointer() may point to an invalid range causing NULL pointer access or page faults while in NMI (see trace below). This happens if called in softirq context and if the stack is empty. The address at &regs->sp is then out of range. Fixing this by checking if regs and &regs->sp are in the same stack context. Otherwise return the previous stack pointer stored in struct thread_info. If that address is invalid too, return address of regs. BUG: unable to handle kernel NULL pointer dereference at 0000000a IP: [<c1004237>] print_context_stack+0x6e/0x8d pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: Pid: 4434, comm: perl Not tainted 3.6.0-rc3-oprofile-i386-standard-g4411a05 #4 Hewlett-Packard HP xw9400 Workstation/0A1Ch EIP: 0060:[<c1004237>] EFLAGS: 00010093 CPU: 0 EIP is at print_context_stack+0x6e/0x8d EAX: ffffe000 EBX: 0000000a ECX: f4435f94 EDX: 0000000a ESI: f4435f94 EDI: f4435f94 EBP: f5409ec0 ESP: f5409ea0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: 0000000a CR3: 34ac9000 CR4: 000007d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process perl (pid: 4434, ti=f5408000 task=f5637850 task.ti=f4434000) Stack: 000003e8 ffffe000 00001ffc f4e39b00 00000000 0000000a f4435f94 c155198c f5409ef0 c1003723 c155198c f5409f04 00000000 f5409edc 00000000 00000000 f5409ee8 f4435f94 f5409fc4 00000001 f5409f1c c12dce1c 00000000 c155198c Call Trace: [<c1003723>] dump_trace+0x7b/0xa1 [<c12dce1c>] x86_backtrace+0x40/0x88 [<c12db712>] ? oprofile_add_sample+0x56/0x84 [<c12db731>] oprofile_add_sample+0x75/0x84 [<c12ddb5b>] op_amd_check_ctrs+0x46/0x260 [<c12dd40d>] profile_exceptions_notify+0x23/0x4c [<c1395034>] nmi_handle+0x31/0x4a [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45 [<c13950ed>] do_nmi+0xa0/0x2ff [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45 [<c13949e5>] nmi_stack_correct+0x28/0x2d [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45 [<c1003603>] ? do_softirq+0x4b/0x7f <IRQ> [<c102a06f>] irq_exit+0x35/0x5b [<c1018f56>] smp_apic_timer_interrupt+0x6c/0x7a [<c1394746>] apic_timer_interrupt+0x2a/0x30 Code: 89 fe eb 08 31 c9 8b 45 0c ff 55 ec 83 c3 04 83 7d 10 00 74 0c 3b 5d 10 73 26 3b 5d e4 73 0c eb 1f 3b 5d f0 76 1a 3b 5d e8 73 15 <8b> 13 89 d0 89 55 e0 e8 ad 42 03 00 85 c0 8b 55 e0 75 a6 eb cc EIP: [<c1004237>] print_context_stack+0x6e/0x8d SS:ESP 0068:f5409ea0 CR2: 000000000000000a ---[ end trace 62afee3481b00012 ]--- Kernel panic - not syncing: Fatal exception in interrupt V2: add comments to kernel_stack_pointer() * always return a valid stack address by falling back to the address of regs Reported-by: Yang Wei <wei.yang@windriver.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Link: http://lkml.kernel.org/r/20120912135059.GZ8285@erda.amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jun Zhang <jun.zhang@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	sparc64: not any error from do_sigaltstack() should fail rt_sigreturn()	Al Viro
	commit fae2ae2a900a5c7bb385fe4075f343e7e2d5daa2 upstream. If a signal handler is executed on altstack and another signal comes, we will end up with rt_sigreturn() on return from the second handler getting -EPERM from do_sigaltstack(). It's perfectly OK, since we are not asking to change the settings; in fact, they couldn't have been changed during the second handler execution exactly because we'd been on altstack all along. 64bit sigreturn on sparc treats any error from do_sigaltstack() as "SIGSEGV now"; we need to switch to the same semantics we are using on other architectures. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	fix virtual aliasing issue in get_shared_area()	James Bottomley
	commit 949a05d03490e39e773e8652ccab9157e6f595b4 upstream. On Thu, 2012-11-01 at 16:45 -0700, Michel Lespinasse wrote: > Looking at the arch/parisc/kernel/sys_parisc.c implementation of > get_shared_area(), I do have a concern though. The function basically > ignores the pgoff argument, so that if one creates a shared mapping of > pages 0-N of a file, and then a separate shared mapping of pages 1-N > of that same file, both will have the same cache offset for their > starting address. > > This looks like this would create obvious aliasing issues. Am I > misreading this ? I can't understand how this could work good enough > to be undetected, so there must be something I'm missing here ??? This turns out to be correct and we need to pay attention to the pgoff as well as the address when creating the virtual address for the area. Fortunately, the bug is rarely triggered as most applications which use pgoff tend to use large values (git being the primary one, and it uses pgoff in multiples of 16MB) which are larger than our cache coherency modulus, so the problem isn't often seen in practise. Reported-by: Michel Lespinasse <walken@google.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	x86, amd: Disable way access filter on Piledriver CPUs	Andre Przywara
	commit 2bbf0a1427c377350f001fbc6260995334739ad7 upstream. The Way Access Filter in recent AMD CPUs may hurt the performance of some workloads, caused by aliasing issues in the L1 cache. This patch disables it on the affected CPUs. The issue is similar to that one of last year: http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html This new patch does not replace the old one, we just need another quirk for newer CPUs. The performance penalty without the patch depends on the circumstances, but is a bit less than the last year's 3%. The workloads affected would be those that access code from the same physical page under different virtual addresses, so different processes using the same libraries with ASLR or multiple instances of PIE-binaries. The code needs to be accessed simultaneously from both cores of the same compute unit. More details can be found here: http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf CPUs affected are anything with the core known as Piledriver. That includes the new parts of the AMD A-Series (aka Trinity) and the just released new CPUs of the FX-Series (aka Vishera). The model numbering is a bit odd here: FX CPUs have model 2, A-Series has model 10h, with possible extensions to 1Fh. Hence the range of model ids. Signed-off-by: Andre Przywara <osp@andrep.de> Link: http://lkml.kernel.org/r/1351700450-9277-1-git-send-email-osp@andrep.de Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> [bwh: Backported to 3.2: wrmsrl_safe() is called checking_wrmsrl()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	x86, mce, therm_throt: Don't report power limit and package level thermal ↵	Fenghua Yu
	throttle events in mcelog commit 29e9bf1841e4f9df13b4992a716fece7087dd237 upstream. Thermal throttle and power limit events are not defined as MCE errors in x86 architecture and should not generate MCE errors in mcelog. Current kernel generates fake software defined MCE errors for these events. This may confuse users because they may think the machine has real MCE errors while actually only thermal throttle or power limit events happen. To make it worse, buggy firmware on some platforms may falsely generate the events. Therefore, kernel reports MCE errors which users think as real hardware errors. Although the firmware bugs should be fixed, on the other hand, kernel should not report MCE errors either. So mcelog is not a good mechanism to report these events. To report the events, we count them in respective counters (core_power_limit_count, package_power_limit_count, core_throttle_count, and package_throttle_count) in /sys/devices/system/cpu/cpu#/thermal_throttle/. Users can check the counters for each event on each CPU. Please note that all CPU's on one package report duplicate counters. It's user application's responsibity to retrieve a package level counter for one package. This patch doesn't report package level power limit, core level power limit, and package level thermal throttle events in mcelog. When the events happen, only report them in respective counters in sysfs. Since core level thermal throttle has been legacy code in kernel for a while and users accepted it as MCE error in mcelog, core level thermal throttle is still reported in mcelog. In the mean time, the event is counted in a counter in sysfs as well. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Borislav Petkov <bp@amd64.org> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20111215001945.GA21009@linux-os.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	m68k: fix sigset_t accessor functions	Andreas Schwab
	commit 34fa78b59c52d1db3513db4c1a999db26b2e9ac2 upstream. The sigaddset/sigdelset/sigismember functions that are implemented with bitfield insn cannot allow the sigset argument to be placed in a data register since the sigset is wider than 32 bits. Remove the "d" constraint from the asm statements. The effect of the bug is that sending RT signals does not work, the signal number is truncated modulo 32. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	x86, mm: Undo incorrect revert in arch/x86/mm/init.c	Yinghai Lu
	commit f82f64dd9f485e13f29f369772d4a0e868e5633a upstream. Commit 844ab6f9 x86, mm: Find_early_table_space based on ranges that are actually being mapped added back some lines back wrongly that has been removed in commit 7b16bbf97 Revert "x86/mm: Fix the size calculation of mapping tables" remove them again. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/CAE9FiQW_vuaYQbmagVnxT2DGsYc=9tNeAbdBq53sYkitPOwxSQ@mail.gmail.com Acked-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	x86, mm: Find_early_table_space based on ranges that are actually being mapped	Jacob Shin
	commit 844ab6f993b1d32eb40512503d35ff6ad0c57030 upstream. Current logic finds enough space for direct mapping page tables from 0 to end. Instead, we only need to find enough space to cover mr[0].start to mr[nr_range].end -- the range that is actually being mapped by init_memory_mapping() This is needed after 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a, to address the panic reported here: https://lkml.org/lkml/2012/10/20/160 https://lkml.org/lkml/2012/10/21/157 Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/20121024195311.GB11779@jshin-Toonie Tested-by: Tom Rini <trini@ti.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> [bwh: Backported to 3.2: - Adjust context - The log message format is a bit different] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct ↵	Jacob Shin
	mapping. commit 1bbbbe779aabe1f0768c2bf8f8c0a5583679b54a upstream. On systems with very large memory (1 TB in our case), BIOS may report a reserved region or a hole in the E820 map, even above the 4 GB range. Exclude these from the direct mapping. [ hpa: this should be done not just for > 4 GB but for everything above the legacy region (1 MB), at the very least. That, however, turns out to require significant restructuring. That work is well underway, but is not suitable for rc/stable. ] Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1319145326-13902-1-git-send-email-jacob.shin@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	s390/gup: add missing TASK_SIZE check to get_user_pages_fast()	Heiko Carstens
	commit d55c4c613fc4d4ad2ba0fc6fa2b57176d420f7e4 upstream. When walking page tables we need to make sure that everything is within bounds of the ASCE limit of the task's address space. Otherwise we might calculate e.g. a pud pointer which is not within a pud and dereference it. So check against TASK_SIZE (which is the ASCE limit) before walking page tables. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-12-06	s390/signal: set correct address space control	Martin Schwidefsky
	commit fa968ee215c0ca91e4a9c3a69ac2405aae6e5d2f upstream. If user space is running in primary mode it can switch to secondary or access register mode, this is used e.g. in the clock_gettime code of the vdso. If a signal is delivered to the user space process while it has been running in access register mode the signal handler is executed in access register mode as well which will result in a crash most of the time. Set the address space control bits in the PSW to the default for the execution of the signal handler and make sure that the previous address space control is restored on signal return. Take care that user space can not switch to the kernel address space by modifying the registers in the signal frame. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: - Adjust filename - The RI bit is not included in PSW_MASK_USER] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	xen/mmu: Use Xen specific TLB flush instead of the generic one.	Konrad Rzeszutek Wilk
	commit 95a7d76897c1e7243d4137037c66d15cbf2cce76 upstream. As Mukesh explained it, the MMUEXT_TLB_FLUSH_ALL allows the hypervisor to do a TLB flush on all active vCPUs. If instead we were using the generic one (which ends up being xen_flush_tlb) we end up making the MMUEXT_TLB_FLUSH_LOCAL hypercall. But before we make that hypercall the kernel will IPI all of the vCPUs (even those that were asleep from the hypervisor perspective). The end result is that we needlessly wake them up and do a TLB flush when we can just let the hypervisor do it correctly. This patch gives around 50% speed improvement when migrating idle guest's from one host to another. Oracle-bug: 14630170 Tested-by: Jingjie Jiang <jingjie.jiang@oracle.com> Suggested-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	ARM: at91/i2c: change id to let i2c-gpio work	Bo Shen
	commit 7840487cd6298f9f931103b558290d8d98d41c49 upstream. The i2c core driver will turn the platform device ID to busnum When using platfrom device ID as -1, it means dynamically assigned the busnum. When writing code, we need to make sure the busnum, and call i2c_register_board_info(int busnum, ...) to register device if using -1, we do not know the value of busnum In order to solve this issue, set the platform device ID as a fix number Here using 0 to match the busnum used in i2c_regsiter_board_info() Signed-off-by: Bo Shen <voice.shen@atmel.com> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	ARM: at91: at91sam9g10: fix SOC type detection	Ivan Shugov
	commit 3d9a0183dd3423353e9e363bcc261c1220d05f9f upstream. Newer at91sam9g10 SoC revision can't be detected, so the kernel can't boot with this kind of kernel panic: "AT91: Impossible to detect the SOC type" CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=00053177 CPU: VIVT data cache, VIVT instruction cache Machine: Atmel AT91SAM9G10-EK Ignoring tag cmdline (using the default kernel command line) bootconsole [earlycon0] enabled Memory policy: ECC disabled, Data cache writeback Kernel panic - not syncing: AT91: Impossible to detect the SOC type [<c00133d4>] (unwind_backtrace+0x0/0xe0) from [<c02366dc>] (panic+0x78/0x1cc) [<c02366dc>] (panic+0x78/0x1cc) from [<c02fa35c>] (at91_map_io+0x90/0xc8) [<c02fa35c>] (at91_map_io+0x90/0xc8) from [<c02f9860>] (paging_init+0x564/0x6d0) [<c02f9860>] (paging_init+0x564/0x6d0) from [<c02f7914>] (setup_arch+0x464/0x704) [<c02f7914>] (setup_arch+0x464/0x704) from [<c02f44f8>] (start_kernel+0x6c/0x2d4) [<c02f44f8>] (start_kernel+0x6c/0x2d4) from [<20008040>] (0x20008040) The reason for this is that the Debug Unit Chip ID Register has changed between Engineering Sample and definitive revision of the SoC. Changing the check of cidr to socid will address the problem. We do not integrate this check to the list just above because we also have to make sure that the extended id is disregarded. Signed-off-by: Ivan Shugov <ivan.shugov@gmail.com> [nicolas.ferre@atmel.com: change commit message] Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-11-16	x86: Remove the ancient and deprecated disable_hlt() and enable_hlt() facility	Len Brown
	commit f6365201d8a21fb347260f89d6e9b3e718d63c70 upstream. The X86_32-only disable_hlt/enable_hlt mechanism was used by the 32-bit floppy driver. Its effect was to replace the use of the HLT instruction inside default_idle() with cpu_relax() - essentially it turned off the use of HLT. This workaround was commented in the code as: "disable hlt during certain critical i/o operations" "This halt magic was a workaround for ancient floppy DMA wreckage. It should be safe to remove." H. Peter Anvin additionally adds: "To the best of my knowledge, no-hlt only existed because of flaky power distributions on 386/486 systems which were sold to run DOS. Since DOS did no power management of any kind, including HLT, the power draw was fairly uniform; when exposed to the much hhigher noise levels you got when Linux used HLT caused some of these systems to fail. They were by far in the minority even back then." Alan Cox further says: "Also for the Cyrix 5510 which tended to go castors up if a HLT occurred during a DMA cycle and on a few other boxes HLT during DMA tended to go astray. Do we care ? I doubt it. The 5510 was pretty obscure, the 5520 fixed it, the 5530 is probably the oldest still in any kind of use." So, let's finally drop this. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Stephen Hemminger <shemminger@vyatta.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-3rhk9bzf0x9rljkv488tloib@git.kernel.org [ If anyone cares then alternative instruction patching could be used to replace HLT with a one-byte NOP instruction. Much simpler. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	arch/tile: avoid generating .eh_frame information in modules	Chris Metcalf
	commit 627072b06c362bbe7dc256f618aaa63351f0cfe6 upstream. The tile tool chain uses the .eh_frame information for backtracing. The vmlinux build drops any .eh_frame sections at link time, but when present in kernel modules, it causes a module load failure due to the presence of unsupported pc-relative relocations. When compiling to use compiler feedback support, the compiler by default omits .eh_frame information, so we don't see this problem. But when not using feedback, we need to explicitly suppress the .eh_frame. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	ARM: 7559/1: smp: switch away from the idmap before updating init_mm.mm_count	Will Deacon
	commit 5f40b909728ad784eb43aa309d3c4e9bdf050781 upstream. When booting a secondary CPU, the primary CPU hands two sets of page tables via the secondary_data struct: (1) swapper_pg_dir: a normal, cacheable, shared (if SMP) mapping of the kernel image (i.e. the tables used by init_mm). (2) idmap_pgd: an uncached mapping of the .idmap.text ELF section. The idmap is generally used when enabling and disabling the MMU, which includes early CPU boot. In this case, the secondary CPU switches to swapper as soon as it enters C code: struct mm_struct mm = &init_mm; unsigned int cpu = smp_processor_id(); / * All kernel threads share the same mm context; grab a * reference and switch to it. */ atomic_inc(&mm->mm_count); current->active_mm = mm; cpumask_set_cpu(cpu, mm_cpumask(mm)); cpu_switch_mm(mm->pgd, mm); This causes a problem on ARMv7, where the identity mapping is treated as strongly-ordered leading to architecturally UNPREDICTABLE behaviour of exclusive accesses, such as those used by atomic_inc. This patch re-orders the secondary_start_kernel function so that we switch to swapper before performing any exclusive accesses. Cc: David McKay <david.mckay@st.com> Reported-by: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	xen/x86: don't corrupt %eip when returning from a signal handler	David Vrabel
	commit a349e23d1cf746f8bdc603dcc61fae9ee4a695f6 upstream. In 32 bit guests, if a userspace process has %eax == -ERESTARTSYS (-512) or -ERESTARTNOINTR (-513) when it is interrupted by an event /and/ the process has a pending signal then %eip (and %eax) are corrupted when returning to the main process after handling the signal. The application may then crash with SIGSEGV or a SIGILL or it may have subtly incorrect behaviour (depending on what instruction it returned to). The occurs because handle_signal() is incorrectly thinking that there is a system call that needs to restarted so it adjusts %eip and %eax to re-execute the system call instruction (even though user space had not done a system call). If %eax == -514 (-ERESTARTNOHAND (-514) or -ERESTART_RESTARTBLOCK (-516) then handle_signal() only corrupted %eax (by setting it to -EINTR). This may cause the application to crash or have incorrect behaviour. handle_signal() assumes that regs->orig_ax >= 0 means a system call so any kernel entry point that is not for a system call must push a negative value for orig_ax. For example, for physical interrupts on bare metal the inverse of the vector is pushed and page_fault() sets regs->orig_ax to -1, overwriting the hardware provided error code. xen_hypervisor_callback() was incorrectly pushing 0 for orig_ax instead of -1. Classic Xen kernels pushed %eax which works as %eax cannot be both non-negative and -RESTARTSYS (etc.), but using -1 is consistent with other non-system call entry points and avoids some of the tests in handle_signal(). There were similar bugs in xen_failsafe_callback() of both 32 and 64-bit guests. If the fault was corrected and the normal return path was used then 0 was incorrectly pushed as the value for orig_ax. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Jan Beulich <JBeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	s390: fix linker script for 31 bit builds	Heiko Carstens
	commit c985cb37f1b39c2c8035af741a2a0b79f1fbaca7 upstream. Because of a change in the s390 arch backend of binutils (commit 23ecd77 "Pick the default arch depending on the target size" in binutils repo) 31 bit builds will fail since the linker would now try to create 64 bit binary output. Fix this by setting OUTPUT_ARCH to s390:31-bit instead of s390. Thanks to Andreas Krebbel for figuring out the issue. Fixes this build error: LD init/built-in.o s390x-4.7.2-ld: s390:31-bit architecture of input file `arch/s390/kernel/head.o' is incompatible with s390:64-bit output Cc: Andreas Krebbel <Andreas.Krebbel@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	oprofile, x86: Fix wrapping bug in op_x86_get_ctrl()	Dan Carpenter
	commit 44009105081b51417f311f4c3be0061870b6b8ed upstream. The "event" variable is a u16 so the shift will always wrap to zero making the line a no-op. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	xen/bootup: allow {read\|write}_cr8 pvops call.	Konrad Rzeszutek Wilk
	commit 1a7bbda5b1ab0e02622761305a32dc38735b90b2 upstream. We actually do not do anything about it. Just return a default value of zero and if the kernel tries to write anything but 0 we BUG_ON. This fixes the case when an user tries to suspend the machine and it blows up in save_processor_state b/c 'read_cr8' is set to NULL and we get: kernel BUG at /home/konrad/ssd/linux/arch/x86/include/asm/paravirt.h:100! invalid opcode: 0000 [#1] SMP Pid: 2687, comm: init.late Tainted: G O 3.6.0upstream-00002-gac264ac-dirty #4 Bochs Bochs RIP: e030:[<ffffffff814d5f42>] [<ffffffff814d5f42>] save_processor_state+0x212/0x270 .. snip.. Call Trace: [<ffffffff810733bf>] do_suspend_lowlevel+0xf/0xac [<ffffffff8107330c>] ? x86_acpi_suspend_lowlevel+0x10c/0x150 [<ffffffff81342ee2>] acpi_suspend_enter+0x57/0xd5 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	xen/bootup: allow read_tscp call for Xen PV guests.	Konrad Rzeszutek Wilk
	commit cd0608e71e9757f4dae35bcfb4e88f4d1a03a8ab upstream. The hypervisor will trap it. However without this patch, we would crash as the .read_tscp is set to NULL. This patch fixes it and sets it to the native_read_tscp call. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	mips,kgdb: fix recursive page fault with CONFIG_KPROBES	Jason Wessel
	commit f0a996eeeda214f4293e234df33b29bec003b536 upstream. This fault was detected using the kgdb test suite on boot and it crashes recursively due to the fact that CONFIG_KPROBES on mips adds an extra die notifier in the page fault handler. The crash signature looks like this: kgdbts:RUN bad memory access test KGDB: re-enter exception: ALL breakpoints killed Call Trace: [<807b7548>] dump_stack+0x20/0x54 [<807b7548>] dump_stack+0x20/0x54 The fix for now is to have kgdb return immediately if the fault type is DIE_PAGE_FAULT and allow the kprobe code to decide what is supposed to happen. Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	ARM: vfp: fix saving d16-d31 vfp registers on v6+ kernels	Russell King
	commit 846a136881b8f73c1f74250bf6acfaa309cab1f2 upstream. Michael Olbrich reported that his test program fails when built with -O2 -mcpu=cortex-a8 -mfpu=neon, and a kernel which supports v6 and v7 CPUs: volatile int x = 2; volatile int64_t y = 2; int main() { volatile int a = 0; volatile int64_t b = 0; while (1) { a = (a + x) % (1 << 30); b = (b + y) % (1 << 30); assert(a == b); } } and two instances are run. When built for just v7 CPUs, this program works fine. It uses the "vadd.i64 d19, d18, d16" VFP instruction. It appears that we do not save the high-16 double VFP registers across context switches when the kernel is built for v6 CPUs. Fix that. Tested-By: Michael Olbrich <m.olbrich@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	sparc64: Be less verbose during vmemmap population.	David S. Miller
	[ Upstream commit 2856cc2e4d0852c3ddaae9dcb19cb9396512eb08 ] On a 2-node machine with 256GB of ram we get 512 lines of console output, which is just too much. This mimicks Yinghai Lu's x86 commit c2b91e2eec9678dbda274e906cc32ea8f711da3b (x86_64/mm: check and print vmemmap allocation continuous) except that we aren't ever going to get contiguous block pointers in between calls so just print when the virtual address or node changes. This decreases the output by an order of 16. Also demote this to KERN_DEBUG. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	sparc64: do not clobber personality flags in sys_sparc64_personality()	Jiri Kosina
	[ Upstream commit a27032eee8cb6e16516f13c8a9752e9d5d4cc430 ] There are multiple errors in how sys_sparc64_personality() handles personality flags stored in top three bytes. - directly comparing current->personality against PER_LINUX32 doesn't work in cases when any of the personality flags stored in the top three bytes are used. - directly forcefully setting personality to PER_LINUX32 or PER_LINUX discards any flags stored in the top three bytes Fix the first one by properly using personality() macro to compare only PER_MASK bytes. Fix the second one by setting only the bits that should be set, instead of overwriting the whole value. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	sparc64: Fix bit twiddling in sparc_pmu_enable_event().	David S. Miller
	[ Upstream commit e793d8c6740f8fe704fa216e95685f4d92c4c4b9 ] There was a serious disconnect in the logic happening in sparc_pmu_disable_event() vs. sparc_pmu_enable_event(). Event disable is implemented by programming a NOP event into the PCR. However, event enable was not reversing this operation. Instead, it was setting the User/Priv/Hypervisor trace enable bits. That's not sparc_pmu_enable_event()'s job, that's what sparc_pmu_enable() and sparc_pmu_disable() do . The intent of sparc_pmu_enable_event() is clear, since it first clear out the event type encoding field. So fix this by OR'ing in the event encoding rather than the trace enable bits. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	sparc64: Like x86 we should check current->mm during perf backtrace generation.	David S. Miller
	[ Upstream commit 08280e6c4c2e8049ac61d9e8e3536ec1df629c0d ] If the MM is not active, only report the top-level PC. Do not try to access the address space. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30	sparc64: fix ptrace interaction with force_successful_syscall_return()	Al Viro
	[ Upstream commit 55c2770e413e96871147b9406a9c41fe9bc5209c ] we want syscall_trace_leave() called on exit from any syscall; skipping its call in case we'd done force_successful_syscall_return() is broken... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17	efi: initialize efi.runtime_version to make ↵	Seiji Aguchi
	query_variable_info/update_capsule workable commit d6cf86d8f23253225fe2a763d627ecf7dfee9dae upstream. A value of efi.runtime_version is checked before calling update_capsule()/query_variable_info() as follows. But it isn't initialized anywhere. <snip> static efi_status_t virt_efi_query_variable_info(u32 attr, u64 storage_space, u64 remaining_space, u64 *max_variable_size) { if (efi.runtime_version < EFI_2_00_SYSTEM_TABLE_REVISION) return EFI_UNSUPPORTED; <snip> This patch initializes a value of efi.runtime_version at boot time. Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Acked-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17	mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP	Andrea Arcangeli
	commit 027ef6c87853b0a9df53175063028edb4950d476 upstream. In many places !pmd_present has been converted to pmd_none. For pmds that's equivalent and pmd_none is quicker so using pmd_none is better. However (unless we delete pmd_present) we should provide an accurate pmd_present too. This will avoid the risk of code thinking the pmd is non present because it's under __split_huge_page_map, see the pmd_mknotpresent there and the comment above it. If the page has been mprotected as PROT_NONE, it would also lead to a pmd_present false negative in the same way as the race with split_huge_page. Because the PSE bit stays on at all times (both during split_huge_page and when the _PAGE_PROTNONE bit get set), we could only check for the PSE bit, but checking the PROTNONE bit too is still good to remember pmd_present must always keep PROT_NONE into account. This explains a not reproducible BUG_ON that was seldom reported on the lists. The same issue is in pmd_large, it would go wrong with both PROT_NONE and if it races with split_huge_page. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17	ARM: OMAP: counter: add locking to read_persistent_clock	Colin Cross
	commit 9d7d6e363b06934221b81a859d509844c97380df upstream. read_persistent_clock uses a global variable, use a spinlock to ensure non-atomic updates to the variable don't overlap and cause time to move backwards. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: R Sricharan <r.sricharan@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17	mn10300: only add -mmem-funcs to KBUILD_CFLAGS if gcc supports it	Geert Uytterhoeven
	commit 9957423f035c2071f6d1c5d2f095cdafbeb25ad7 upstream. It seems the current (gcc 4.6.3) no longer provides this so make it conditional. As reported by Tony before, the mn10300 architecture cross-compiles with gcc-4.6.3 if -mmem-funcs is not added to KBUILD_CFLAGS. Reported-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: David Howells <dhowells@redhat.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17	kbuild: Fix gcc -x syntax	Jean Delvare
	commit b1e0d8b70fa31821ebca3965f2ef8619d7c5e316 upstream. The correct syntax for gcc -x is "gcc -x assembler", not "gcc -xassembler". Even though the latter happens to work, the former is what is documented in the manual page and thus what gcc wrappers such as icecream do expect. This isn't a cosmetic change. The missing space prevents icecream from recognizing compilation tasks it can't handle, leading to silent kernel miscompilations. Besides me, credits go to Michael Matz and Dirk Mueller for investigating the miscompilation issue and tracking it down to this incorrect -x parameter syntax. Signed-off-by: Jean Delvare <jdelvare@suse.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Bernhard Walle <bernhard@bwalle.de> Cc: Michal Marek <mmarek@suse.cz> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Michal Marek <mmarek@suse.cz> [bwh: Backported to 3.2: drop unneeded change to arch/x86/Makefile] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-17	powerpc/eeh: Lock module while handling EEH event	Gavin Shan
	commit feadf7c0a1a7c08c74bebb4a13b755f8c40e3bbc upstream. The EEH core is talking with the PCI device driver to determine the action (purely reset, or PCI device removal). During the period, the driver might be unloaded and in turn causes kernel crash as follows: EEH: Detected PCI bus error on PHB#4-PE#10000 EEH: This PCI device has failed 3 times in the last hour lpfc 0004:01:00.0: 0:2710 PCI channel disable preparing for reset Unable to handle kernel paging request for data at address 0x00000490 Faulting instruction address: 0xd00000000e682c90 cpu 0x1: Vector: 300 (Data Access) at [c000000fc75ffa20] pc: d00000000e682c90: .lpfc_io_error_detected+0x30/0x240 [lpfc] lr: d00000000e682c8c: .lpfc_io_error_detected+0x2c/0x240 [lpfc] sp: c000000fc75ffca0 msr: 8000000000009032 dar: 490 dsisr: 40000000 current = 0xc000000fc79b88b0 paca = 0xc00000000edb0380 softe: 0 irq_happened: 0x00 pid = 3386, comm = eehd enter ? for help [c000000fc75ffca0] c000000fc75ffd30 (unreliable) [c000000fc75ffd30] c00000000004fd3c .eeh_report_error+0x7c/0xf0 [c000000fc75ffdc0] c00000000004ee00 .eeh_pe_dev_traverse+0xa0/0x180 [c000000fc75ffe70] c00000000004ffd8 .eeh_handle_event+0x68/0x300 [c000000fc75fff00] c0000000000503a0 .eeh_event_handler+0x130/0x1a0 [c000000fc75fff90] c000000000020138 .kernel_thread+0x54/0x70 1:mon> The patch increases the reference of the corresponding driver modules while EEH core does the negotiation with PCI device driver so that the corresponding driver modules can't be unloaded during the period and we're safe to refer the callbacks. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [bwh: Backported to 3.2: - Adjust context - Reporting functions return int (success = 0), not void * (success = NULL) - Assume that the 'dev' arguments are non-null] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-10	x86/alternatives: Fix p6 nops on non-modular kernels	Avi Kivity
	commit cb09cad44f07044d9810f18f6f9a6a6f3771f979 upstream. Probably a leftover from the early days of self-patching, p6nops are marked __initconst_or_module, which causes them to be discarded in a non-modular kernel. If something later triggers patching, it will overwrite kernel code with garbage. Reported-by: Tomas Racek <tracek@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Cc: Michael Tokarev <mjt@tls.msk.ru> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: qemu-devel@nongnu.org Cc: Anthony Liguori <anthony@codemonkey.ws> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Alan Cox <alan@linux.intel.com> Link: http://lkml.kernel.org/r/5034AE84.90708@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-10	xen/boot: Disable NUMA for PV guests.	Konrad Rzeszutek Wilk
	commit 8d54db795dfb1049d45dc34f0dddbc5347ec5642 upstream. The hypervisor is in charge of allocating the proper "NUMA" memory and dealing with the CPU scheduler to keep them bound to the proper NUMA node. The PV guests (and PVHVM) have no inkling of where they run and do not need to know that right now. In the future we will need to inject NUMA configuration data (if a guest spans two or more NUMA nodes) so that the kernel can make the right choices. But those patches are not yet present. In the meantime, disable the NUMA capability in the PV guest, which also fixes a bootup issue. Andre says: "we see Dom0 crashes due to the kernel detecting the NUMA topology not by ACPI, but directly from the northbridge (CONFIG_AMD_NUMA). This will detect the actual NUMA config of the physical machine, but will crash about the mismatch with Dom0's virtual memory. Variation of the theme: Dom0 sees what it's not supposed to see. This happens with the said config option enabled and on a machine where this scanning is still enabled (K8 and Fam10h, not Bulldozer class) We have this dump then: NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10 Scanning NUMA topology in Northbridge 24 Number of physical nodes 4 Node 0 MemBase 0000000000000000 Limit 0000000040000000 Node 1 MemBase 0000000040000000 Limit 0000000138000000 Node 2 MemBase 0000000138000000 Limit 00000001f8000000 Node 3 MemBase 00000001f8000000 Limit 0000000238000000 Initmem setup node 0 0000000000000000-0000000040000000 NODE_DATA [000000003ffd9000 - 000000003fffffff] Initmem setup node 1 0000000040000000-0000000138000000 NODE_DATA [0000000137fd9000 - 0000000137ffffff] Initmem setup node 2 0000000138000000-00000001f8000000 NODE_DATA [00000001f095e000 - 00000001f0984fff] Initmem setup node 3 00000001f8000000-0000000238000000 Cannot find 159744 bytes in node 3 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar RIP: e030:[<ffffffff81d220e6>] [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96 .. snip.. [<ffffffff81d23024>] sparse_early_usemaps_alloc_node+0x64/0x178 [<ffffffff81d23348>] sparse_init+0xe4/0x25a [<ffffffff81d16840>] paging_init+0x13/0x22 [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b [<ffffffff81683954>] ? printk+0x3c/0x3e [<ffffffff81d01a38>] start_kernel+0xe5/0x468 [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1 [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36 [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c " so we just disable NUMA scanning by setting numa_off=1. Reported-and-Tested-by: Andre Przywara <andre.przywara@amd.com> Acked-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-10	xen/boot: Disable BIOS SMP MP table search.	Konrad Rzeszutek Wilk
	commit bd49940a35ec7d488ae63bd625639893b3385b97 upstream. As the initial domain we are able to search/map certain regions of memory to harvest configuration data. For all low-level we use ACPI tables - for interrupts we use exclusively ACPI _PRT (so DSDT) and MADT for INT_SRC_OVR. The SMP MP table is not used at all. As a matter of fact we do not even support machines that only have SMP MP but no ACPI tables. Lets follow how Moorestown does it and just disable searching for BIOS SMP tables. This also fixes an issue on HP Proliant BL680c G5 and DL380 G6: 9f->100 for 1:1 PTE Freeing 9f-100 pfn range: 97 pages freed 1-1 mapping on 9f->100 .. snip.. e820: BIOS-provided physical RAM map: Xen: [mem 0x0000000000000000-0x000000000009efff] usable Xen: [mem 0x000000000009f400-0x00000000000fffff] reserved Xen: [mem 0x0000000000100000-0x00000000cfd1dfff] usable .. snip.. Scan for SMP in [mem 0x00000000-0x000003ff] Scan for SMP in [mem 0x0009fc00-0x0009ffff] Scan for SMP in [mem 0x000f0000-0x000fffff] found SMP MP-tabl