linux - Linux kernel source tree

Age	Commit message (Collapse)	Author
2011-11-08	Revert "powerpc/mpic: Fix problem that affinity is not updated"	Greg Kroah-Hartman
	This reverts commit 1badd98ea79b7b20fb4ddfea110d1bb99c33a55f. It breaks the build on powerpc systems: arch/powerpc/sysdev/mpic.c: In function 'irq_choose_cpu': arch/powerpc/sysdev/mpic.c:574: error: passing argument 1 of '__cpus_equal' from incompatible pointer type Reported-by: Jiri Slaby <jslaby@suse.cz> Cc: Jiajun Wu <b06378@freescale.com> Cc: Li Yang <leoli@freescale.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	Revert "MIPS: MTX-1: Make au1000_eth probe all PHY	Florian Fainelli
	Commit ec3eb823 was not applicable in 2.6.32 and introduces a build breakage. Revert that commit since it is irrelevant for this kernel version. Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Florian Fainelli <florian@openwrt.org>
2011-11-07	KVM: x86: Reset tsc_timestamp on TSC writes	Philipp Hahn
	There is no upstream commit ID for this patch since it is not a straight backport from upstream. It is a fix only relevant to 2.6.32.y. Since 1d5f066e0b63271b67eac6d3752f8aa96adcbddb from 2.6.37 was back-ported to 2.6.32.40 as ad2088cabe0fd7f633f38ba106025d33ed9a2105, the following patch is needed to add the needed reset logic to 2.6.32 as well. Bug #23257: Reset tsc_timestamp on TSC writes vcpu->last_guest_tsc is updated in vcpu_enter_guest() and kvm_arch_vcpu_put() by getting the last value of the TSC from the guest. On reset, the SeaBIOS resets the TSC to 0, which triggers a bug on the next call to kvm_write_guest_time(): Since vcpu->hw_clock.tsc_timestamp still contains the old value before the reset, "max_kernel_ns = vcpu->last_guest_tsc - vcpu->hw_clock.tsc_timestamp" gets negative. Since the variable is u64, it gets translated to a large positive value. [9333.197080] vcpu->last_guest_tsc =209_328_760_015 ← vcpu->hv_clock.tsc_timestamp=209_328_708_109 vcpu->last_kernel_ns =9_333_179_830_643 kernel_ns =9_333_197_073_429 max_kernel_ns =9_333_179_847_943 ← [9336.910995] vcpu->last_guest_tsc =9_438_510_584 ← vcpu->hv_clock.tsc_timestamp=211_080_593_143 vcpu->last_kernel_ns =9_333_763_732_907 kernel_ns =9_336_910_990_771 max_kernel_ns =6_148_296_831_006_663_830 ← For completeness, here are the values for my 3 GHz CPU: vcpu->hv_clock.tsc_shift =-1 vcpu->hv_clock.tsc_to_system_mul =2_863_019_502 This makes the guest kernel crawl very slowly when clocksource=kvmclock is used: sleeps take way longer than expected and don't match wall clock any more. The times printed with printk() don't match real time and the reboot often stalls for long times. In linux-git this isn't a problem, since on every MSR_IA32_TSC write vcpu->arch.hv_clock.tsc_timestamp is reset to 0, which disables above logic. The code there is only in arch/x86/kvm/x86.c, since much of the kvm-clock related code has been refactured for 2.6.37: 99e3e30a arch/x86/kvm/x86.c (Zachary Amsden 2010-08-19 22:07:17 -1000 1084) vcpu->arch.hv_clock.tsc_timestamp = 0; Signed-off-by: Philipp Hahn <hahn@univention.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend.	Ian Campbell
	commit f611f2da99420abc973c32cdbddbf5c365d0a20c upstream. The patches missed an indirect use of IRQF_NO_SUSPEND pulled in via IRQF_TIMER. The following patch fixes the issue. With this fixlet PV guest migration works just fine. I also booted the entire series as a dom0 kernel and it appeared fine. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	um: fix ubd cow size	Richard Weinberger
	commit 8535639810e578960233ad39def3ac2157b0c3ec upstream. ubd_file_size() cannot use ubd_dev->cow.file because at this time ubd_dev->cow.file is not initialized. Therefore, ubd_file_size() will always report a wrong disk size when COW files are used. Reading from /dev/ubd* would crash the kernel. We have to read the correct disk size from the COW file's backing file. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	plat-mxc: iomux-v3.h: implicitly enable pull-up/down when that's desired	Paul Fertser
	commit 6571534b600b8ca1936ff5630b9e0947f21faf16 upstream. To configure pads during the initialisation a set of special constants is used, e.g. #define MX25_PAD_FEC_MDIO__FEC_MDIO IOMUX_PAD(0x3c4, 0x1cc, 0x10, 0, 0, PAD_CTL_HYS \| PAD_CTL_PUS_22K_UP) The problem is that no pull-up/down is getting activated unless both PAD_CTL_PUE (pull-up enable) and PAD_CTL_PKE (pull/keeper module enable) set. This is clearly stated in the i.MX25 datasheet and is confirmed by the measurements on hardware. This leads to some rather hard to understand bugs such as misdetecting an absent ethernet PHY (a real bug i had), unstable data transfer etc. This might affect mx25, mx35, mx50, mx51 and mx53 SoCs. It's reasonable to expect that if the pullup value is specified, the intention was to have it actually active, so we implicitly add the needed bits. Signed-off-by: Paul Fertser <fercerpav@gmail.com> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	iommu/amd: Fix wrong shift direction	Joerg Roedel
	commit fcd0861db1cf4e6ed99f60a815b7b72c2ed36ea4 upstream. The shift direction was wrong because the function takes a page number and i is the address is the loop. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	KVM: s390: check cpu_id prior to using it	Carsten Otte
	commit 4d47555a80495657161a7e71ec3014ff2021e450 upstream. We use the cpu id provided by userspace as array index here. Thus we clearly need to check it first. Ooops. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	x86: Fix compilation bug in kprobes' twobyte_is_boostable	Josh Stone
	commit 315eb8a2a1b7f335d40ceeeb11b9e067475eb881 upstream. When compiling an i386_defconfig kernel with gcc-4.6.1-9.fc15.i686, I noticed a warning about the asm operand for test_bit in kprobes' can_boost. I discovered that this caused only the first long of twobyte_is_boostable[] to be output. Jakub filed and fixed gcc PR50571 to correct the warning and this output issue. But to solve it for less current gcc, we can make kprobes' twobyte_is_boostable[] non-const, and it won't be optimized out. Before: CC arch/x86/kernel/kprobes.o In file included from include/linux/bitops.h:22:0, from include/linux/kernel.h:17, from [...]/arch/x86/include/asm/percpu.h:44, from [...]/arch/x86/include/asm/current.h:5, from [...]/arch/x86/include/asm/processor.h:15, from [...]/arch/x86/include/asm/atomic.h:6, from include/linux/atomic.h:4, from include/linux/mutex.h:18, from include/linux/notifier.h:13, from include/linux/kprobes.h:34, from arch/x86/kernel/kprobes.c:43: [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’: [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input without lvalue in asm operand 1 is deprecated [enabled by default] $ objdump -rd arch/x86/kernel/kprobes.o \| grep -A1 -w bt 551: 0f a3 05 00 00 00 00 bt %eax,0x0 554: R_386_32 .rodata.cst4 $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o arch/x86/kernel/kprobes.o: file format elf32-i386 Contents of section .data: 0000 48000000 00000000 00000000 00000000 H............... Contents of section .rodata.cst4: 0000 4c030000 L... Only a single long of twobyte_is_boostable[] is in the object file. After, without the const on twobyte_is_boostable: $ objdump -rd arch/x86/kernel/kprobes.o \| grep -A1 -w bt 551: 0f a3 05 20 00 00 00 bt %eax,0x20 554: R_386_32 .data $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o arch/x86/kernel/kprobes.o: file format elf32-i386 Contents of section .data: 0000 48000000 00000000 00000000 00000000 H............... 0010 00000000 00000000 00000000 00000000 ................ 0020 4c030000 0f000200 ffff0000 ffcff0c0 L............... 0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77 ....;.......&..w Now all 32 bytes are output into .data instead. Signed-off-by: Josh Stone <jistone@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	ARM: davinci: da850 EVM: read mac address from SPI flash	Sudhakar Rajashekhara
	commit 810198bc9c109489dfadc57131c5183ce6ad2d7d upstream. DA850/OMAP-L138 EMAC driver uses random mac address instead of a fixed one because the mac address is not stuffed into EMAC platform data. This patch provides a function which reads the mac address stored in SPI flash (registered as MTD device) and populates the EMAC platform data. The function which reads the mac address is registered as a callback which gets called upon addition of MTD device. NOTE: In case the MAC address stored in SPI flash is erased, follow the instructions at [1] to restore it. [1] http://processors.wiki.ti.com/index.php/GSG:_OMAP-L138_DVEVM_Additional_Procedures#Restoring_MAC_address_on_SPI_Flash Modifications in v2: Guarded registering the mtd_notifier only when MTD is enabled. Earlier this was handled using mtd_has_partitions() call, but this has been removed in Linux v3.0. Modifications in v3: a. Guarded da850_evm_m25p80_notify_add() function and da850evm_spi_notifier structure with CONFIG_MTD macros. b. Renamed da850_evm_register_mtd_user() function to da850_evm_setup_mac_addr() and removed the struct mtd_notifier argument to this function. c. Passed the da850evm_spi_notifier structure to register_mtd_user() function. Modifications in v4: Moved the da850_evm_setup_mac_addr() function within the first CONFIG_MTD ifdef construct. Signed-off-by: Sudhakar Rajashekhara <sudhakar.raj@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	xen/smp: Warn user why they keel over - nosmp or noapic and what to use instead.	Konrad Rzeszutek Wilk
	commit ed467e69f16e6b480e2face7bc5963834d025f91 upstream. We have hit a couple of customer bugs where they would like to use those parameters to run an UP kernel - but both of those options turn of important sources of interrupt information so we end up not being able to boot. The correct way is to pass in 'dom0_max_vcpus=1' on the Xen hypervisor line and the kernel will patch itself to be a UP kernel. Fixes bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=637308 Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	xen: x86_32: do not enable iterrupts when returning from exception in ↵	Igor Mammedov
	interrupt context commit d198d499148a0c64a41b3aba9e7dd43772832b91 upstream. If vmalloc page_fault happens inside of interrupt handler with interrupts disabled then on exit path from exception handler when there is no pending interrupts, the following code (arch/x86/xen/xen-asm_32.S:112): cmpw $0x0001, XEN_vcpu_info_pending(%eax) sete XEN_vcpu_info_mask(%eax) will enable interrupts even if they has been previously disabled according to eflags from the bounce frame (arch/x86/xen/xen-asm_32.S:99) testb $X86_EFLAGS_IF>>8, 8+1+ESP_OFFSET(%esp) setz XEN_vcpu_info_mask(%eax) Solution is in setting XEN_vcpu_info_mask only when it should be set according to cmpw $0x0001, XEN_vcpu_info_pending(%eax) but not clearing it if there isn't any pending events. Reproducer for bug is attached to RHBZ 707552 Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	powerpc/pci: Check devices status property when scanning OF tree	Sonny Rao
	commit 5b339bdf164d8aee394609768f7e2e4415b0252a upstream. We ran into an issue where it looks like we're not properly ignoring a pci device with a non-good status property when we walk the device tree and instanciate the Linux side PCI devices. However, the EEH init code does look for the property and disables EEH on these devices. This leaves us in an inconsistent where we are poking at a supposedly bad piece of hardware and RTAS will block our config cycles because EEH isn't enabled anyway. Signed-of-by: Sonny Rao <sonnyrao@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	powerpc/mpic: Fix problem that affinity is not updated	Yang Li
	commit 38e1313fc753482b93aa6c6f11cfbd43a5bcd963 upstream. Since commit 57b150cce8e004ddd36330490a68bfb59b7271e9, desc->affinity of an irq is changed after calling desc->chip->set_affinity. Therefore we need to fix the irq_choose_cpu() not to depend on the desc->affinity for new mask. Signed-off-by: Jiajun Wu <b06378@freescale.com> Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	Revert "x86, hotplug: Use mwait to offline a processor, fix the legacy case"	Greg Kroah-Hartman
	This reverts commit 226917b0735f31cf5c704e07fdd590d99bbfae58 (upstream ea53069231f9317062910d6e772cca4ce93de8c8 and a68e5c94f7d3dd64fef34dd5d97e365cae4bb42a and ce5f68246bf2385d6174856708d0b746dc378f20 all mushed together) as Jonathan Nieder reports that this causes a regression on some hardware. More details can be found at http://bugs.debian.org/622259 Cc: Jonathan Nieder <jrnieder@gmail.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	sparc: fix array bounds error setting up PCIC NMI trap	Ian Campbell
	commit 4a0342ca8e8150bd47e7118a76e300692a1b6b7b upstream. CC arch/sparc/kernel/pcic.o arch/sparc/kernel/pcic.c: In function 'pcic_probe': arch/sparc/kernel/pcic.c:359:33: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:359:8: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:360:33: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:360:8: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:361:33: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:361:8: error: array subscript is above array bounds [-Werror=array-bounds] cc1: all warnings being treated as errors I'm not particularly familiar with sparc but t_nmi (defined in head_32.S via the TRAP_ENTRY macro) and pcic_nmi_trap_patch (defined in entry.S) both appear to be 4 instructions long and I presume from the usage that instructions are int sized. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-07	sparc: Allow handling signals when stack is corrupted.	David S. Miller
	commit 5598473a5b40c47a8c5349dd2c2630797169cf1a upstream. If we can't push the pending register windows onto the user's stack, we disallow signal delivery even if the signal would be delivered on a valid seperate signal stack. Add a register window save area in the signal frame, and store any unsavable windows there. On sigreturn, if any windows are still queued up in the signal frame, try to push them back onto the stack and if that fails we kill the process immediately. This allows the debug/tst-longjmp_chk2 glibc test case to pass. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-29	x86, UV: Remove UV delay in starting slave cpus	Jack Steiner
	commit 05e33fc20ea5e493a2a1e7f1d04f43cdf89f83ed upstream. Delete the 10 msec delay between the INIT and SIPI when starting slave cpus. I can find no requirement for this delay. BIOS also has similar code sequences without the delay. Removing the delay reduces boot time by 40 sec. Every bit helps. Signed-off-by: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/20110805140900.GA6774@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-29	x86-32, vdso: On system call restart after SYSENTER, use int $0x80	H. Peter Anvin
	commit 7ca0758cdb7c241cb4e0490a8d95f0eb5b861daf upstream. When we enter a 32-bit system call via SYSENTER or SYSCALL, we shuffle the arguments to match the int $0x80 calling convention. This was probably a design mistake, but it's what it is now. This causes errors if the system call as to be restarted. For SYSENTER, we have to invoke the instruction from the vdso as the return address is hardcoded. Accordingly, we can simply replace the jump in the vdso with an int $0x80 instruction and use the slower entry point for a post-restart. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/CA%2B55aFztZ=r5wa0x26KJQxvZOaQq8s2v3u50wCyJcA-Sc4g8gQ@mail.gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-15	powerpc: pseries: Fix kexec on machines with more than 4TB of RAM	Anton Blanchard
	commit bed9a31527af8ff3dfbad62a1a42815cef4baab7 upstream. On a box with 8TB of RAM the MMU hashtable is 64GB in size. That means we have 4G PTEs. pSeries_lpar_hptab_clear was using a signed int to store the index which will overflow at 2G. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-15	powerpc: Fix device tree claim code	Anton Blanchard
	commit 966728dd88b4026ec58fee169ccceaeaf56ef120 upstream. I have a box that fails in OF during boot with: DEFAULT CATCH!, exception-handler=fff00400 at %SRR0: 49424d2c4c6f6768 %SRR1: 800000004000b002 ie "IBM,Logh". OF got corrupted with a device tree string. Looking at make_room and alloc_up, we claim the first chunk (1 MB) but we never claim any more. mem_end is always set to alloc_top which is the top of our available address space, guaranteeing we will never call alloc_up and claim more memory. Also alloc_up wasn't setting alloc_bottom to the bottom of the available address space. This doesn't help the box to boot, but we at least fail with an obvious error. We could relocate the device tree in a future patch. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08	alpha: fix several security issues	Dan Rosenberg
	commit 21c5977a836e399fc710ff2c5367845ed5c2527f upstream. Fix several security issues in Alpha-specific syscalls. Untested, but mostly trivial. 1. Signedness issue in osf_getdomainname allows copying out-of-bounds kernel memory to userland. 2. Signedness issue in osf_sysinfo allows copying large amounts of kernel memory to userland. 3. Typo (?) in osf_getsysinfo bounds minimum instead of maximum copy size, allowing copying large amounts of kernel memory to userland. 4. Usage of user pointer in osf_wait4 while under KERNEL_DS allows privilege escalation via writing return value of sys_wait4 to kernel memory. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08	x86: HPET: Chose a paranoid safe value for the ETIME check	Thomas Gleixner
	(imported from commit v2.6.37-rc5-64-gf1c1807) commit 995bd3bb5 (x86: Hpet: Avoid the comparator readback penalty) chose 8 HPET cycles as a safe value for the ETIME check, as we had the confirmation that the posted write to the comparator register is delayed by two HPET clock cycles on Intel chipsets which showed readback problems. After that patch hit mainline we got reports from machines with newer AMD chipsets which seem to have an even longer delay. See http://thread.gmane.org/gmane.linux.kernel/1054283 and http://thread.gmane.org/gmane.linux.kernel/1069458 for further information. Boris tried to come up with an ACPI based selection of the minimum HPET cycles, but this failed on a couple of test machines. And of course we did not get any useful information from the hardware folks. For now our only option is to chose a paranoid high and safe value for the minimum HPET cycles used by the ETIME check. Adjust the minimum ns value for the HPET clockevent accordingly. Reported-Bistected-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <alpine.LFD.2.00.1012131222420.2653@localhost6.localdomain6> Cc: Simon Kirby <sim@hostway.ca> Cc: Borislav Petkov <bp@alien8.de> Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08	x86: Hpet: Avoid the comparator readback penalty	Thomas Gleixner
	(imported from commit v2.6.36-rc4-167-g995bd3b) Due to the overly intelligent design of HPETs, we need to workaround the problem that the compare value which we write is already behind the actual counter value at the point where the value hits the real compare register. This happens for two reasons: 1) We read out the counter, add the delta and write the result to the compare register. When a NMI or SMI hits between the read out and the write then the counter can be ahead of the event already 2) The write to the compare register is delayed by up to two HPET cycles in certain chipsets. We worked around this by reading back the compare register to make sure that the written value has hit the hardware. For certain ICH9+ chipsets this can require two readouts, as the first one can return the previous compare register value. That's bad performance wise for the normal case where the event is far enough in the future. As we already know that the write can be delayed by up to two cycles we can avoid the read back of the compare register completely if we make the decision whether the delta has elapsed already or not based on the following calculation: cmp = event - actual_count; If cmp is less than 8 HPET clock cycles, then we decide that the event has happened already and return -ETIME. That covers the above #1 and #2 problems which would cause a wait for HPET wraparound (~306 seconds). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nix <nix@esperi.org.uk> Tested-by: Artur Skawina <art.08.09@gmail.com> Cc: Damien Wyart <damien.wyart@free.fr> Tested-by: John Drescher <drescherjm@gmail.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Tested-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <alpine.LFD.2.00.1009151500060.2416@localhost6.localdomain6> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08	powerpc/pseries/hvconsole: Fix dropped console output	Anton Blanchard
	commit 51d33021425e1f905beb4208823146f2fb6517da upstream. Return -EAGAIN when we get H_BUSY back from the hypervisor. This makes the hvc console driver retry, avoiding dropped printks. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08	xtensa: prevent arbitrary read in ptrace	Dan Rosenberg
	commit 0d0138ebe24b94065580bd2601f8bb7eb6152f56 upstream. Prevent an arbitrary kernel read. Check the user pointer with access_ok() before copying data in. [akpm@linux-foundation.org: s/EIO/EFAULT/] Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: Christian Zankel <chris@zankel.net> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08	powerpc/kdump: Fix timeout in crash_kexec_wait_realmode	Michael Neuling
	commit 63f21a56f1cc0b800a4c00349c59448f82473d19 upstream. The existing code it pretty ugly. How about we clean it up even more like this? From: Anton Blanchard <anton@samba.org> We check for timeout expiry in the outer loop, but we also need to check it in the inner loop or we can lock up forever waiting for a CPU to hit real mode. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08	kexec, x86: Fix incorrect jump back address if not preserving context	Huang Ying
	commit 050438ed5a05b25cdf287f5691e56a58c2606997 upstream. In kexec jump support, jump back address passed to the kexeced kernel via function calling ABI, that is, the function call return address is the jump back entry. Furthermore, jump back entry == 0 should be used to signal that the jump back or preserve context is not enabled in the original kernel. But in the current implementation the stack position used for function call return address is not cleared context preservation is disabled. The patch fixes this bug. Reported-and-tested-by: Yin Kangkai <kangkai.yin@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Link: http://lkml.kernel.org/r/1310607277-25029-1-git-send-email-ying.huang@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08	ARM: pxa/cm-x300: fix V3020 RTC functionality	Igor Grinberg
	commit 6c7b3ea52e345ab614edb91d3f0e9f3bb3713871 upstream. While in sleep mode the CS# and other V3020 RTC GPIOs must be driven high, otherwise V3020 RTC fails to keep the right time in sleep mode. Signed-off-by: Igor Grinberg <grinberg@compulab.co.il> Signed-off-by: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08	x86: Make Dell Latitude E5420 use reboot=pci	Daniel J Blueman
	commit b7798d28ec15d20fd34b70fa57eb13f0cf6d1ecd upstream. Rebooting on the Dell E5420 often hangs with the keyboard or ACPI methods, but is reliable via the PCI method. [ hpa: this was deferred because we believed for a long time that the recent reshuffling of the boot priorities in commit 660e34cebf0a11d54f2d5dd8838607452355f321 fixed this platform. Unfortunately that turned out to be incorrect. ] Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Link: http://lkml.kernel.org/r/1305248699-2347-1-git-send-email-daniel.blueman@gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-08-08	davinci: DM365 EVM: fix video input mux bits	Jon Povey
	commit 9daedd833a38edd90cf7baa1b1fcf61c3a0721e3 upstream. Video input mux settings for tvp7002 and imager inputs were swapped. Comment was correct. Tested on EVM with tvp7002 input. Signed-off-by: Jon Povey <jon.povey@racelogic.co.uk> Acked-by: Manjunath Hadli <manjunath.hadli@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-13	um: os-linux/mem.c needs sys/stat.h	Liu Aleaxander
	commit fb967ecc584c20c74a007de749ca597068b0fcac upstream. The os-linux/mem.c file calls fchmod function, which is declared in sys/stat.h header file, so include it. Fixes build breakage under FC13. Signed-off-by: Liu Aleaxander <Aleaxander@gmail.com> Acked-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-13	uml: fix CONFIG_STATIC_LINK=y build failure with newer glibc	Roland McGrath
	commit aa5fb4dbfd121296ca97c68cf90043a7ea97579d upstream. With glibc 2.11 or later that was built with --enable-multi-arch, the UML link fails with undefined references to __rel_iplt_start and similar symbols. In recent binutils, the default linker script defines these symbols (see ld --verbose). Fix the UML linker scripts to match the new defaults for these sections. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-07-13	xen: partially revert "xen: set max_pfn_mapped to the last pfn mapped"	Stefano Stabellini
	commit a91d92875ee94e4703fd017ccaadb48cfb344994 upstream. We only need to set max_pfn_mapped to the last pfn mapped on x86_64 to make sure that cleanup_highmap doesn't remove important mappings at _end. We don't need to do this on x86_32 because cleanup_highmap is not called on x86_32. Besides lowering max_pfn_mapped on x86_32 has the unwanted side effect of limiting the amount of memory available for the 1:1 kernel pagetable allocation. This patch reverts the x86_32 part of the original patch. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23	exec: delay address limit change until point of no return	Mathias Krause
	commit dac853ae89043f1b7752875300faf614de43c74b upstream. Unconditionally changing the address limit to USER_DS and not restoring it to its old value in the error path is wrong because it prevents us using kernel memory on repeated calls to this function. This, in fact, breaks the fallback of hard coded paths to the init program from being ever successful if the first candidate fails to load. With this patch applied switching to USER_DS is delayed until the point of no return is reached which makes it possible to have a multi-arch rootfs with one arch specific init binary for each of the (hard coded) probed paths. Since the address limit is already set to USER_DS when start_thread() will be invoked, this redundancy can be safely removed. Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23	x86/amd-iommu: Fix 3 possible endless loops	Joerg Roedel
	commit 0de66d5b35ee148455e268b2782873204ffdef4b upstream. The driver contains several loops counting on an u16 value where the exit-condition is checked against variables that can have values up to 0xffff. In this case the loops will never exit. This patch fixed 3 such loops. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23	xen: off by one errors in multicalls.c	Dan Carpenter
	commit f124c6ae59e193705c9ddac57684d50006d710e6 upstream. b->args[] has MC_ARGS elements, so the comparison here should be ">=" instead of ">". Otherwise we read past the end of the array one space. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23	xen mmu: fix a race window causing leave_mm BUG()	Tian, Kevin
	commit 7899891c7d161752f29abcc9bc0a9c6c3a3af26c upstream. There's a race window in xen_drop_mm_ref, where remote cpu may exit dirty bitmap between the check on this cpu and the point where remote cpu handles drop request. So in drop_other_mm_ref we need check whether TLB state is still lazy before calling into leave_mm. This bug is rarely observed in earlier kernel, but exaggerated by the commit 831d52bc153971b70e64eccfbed2b232394f22f8 ("x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after switching mm") which clears bitmap after changing the TLB state. the call trace is as below: --------------------------------- kernel BUG at arch/x86/mm/tlb.c:61! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/system/xen_memory/xen_memory0/info/current_kb CPU 1 Modules linked in: 8021q garp xen_netback xen_blkback blktap blkback_pagemap nbd bridge stp llc autofs4 ipmi_devintf ipmi_si ipmi_msghandler lockd sunrpc bonding ipv6 xenfs dm_multipath video output sbs sbshc parport_pc lp parport ses enclosure snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device serio_raw bnx2 snd_pcm_oss snd_mixer_oss snd_pcm snd_timer iTCO_wdt snd soundcore snd_page_alloc i2c_i801 iTCO_vendor_support i2c_core pcs pkr pata_acpi ata_generic ata_piix shpchp mptsas mptscsih mptbase [last unloaded: freq_table] Pid: 25581, comm: khelper Not tainted 2.6.32.36fixxen #1 Tecal RH2285 RIP: e030:[<ffffffff8103a3cb>] [<ffffffff8103a3cb>] leave_mm+0x15/0x46 RSP: e02b:ffff88002805be48 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88015f8e2da0 RDX: ffff88002805be78 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff88002805be48 R08: ffff88009d662000 R09: dead000000200200 R10: dead000000100100 R11: ffffffff814472b2 R12: ffff88009bfc1880 R13: ffff880028063020 R14: 00000000000004f6 R15: 0000000000000000 FS: 00007f62362d66e0(0000) GS:ffff880028058000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000003aabc11909 CR3: 000000009b8ca000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000 00 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process khelper (pid: 25581, threadinfo ffff88007691e000, task ffff88009b92db40) Stack: ffff88002805be68 ffffffff8100e4ae 0000000000000001 ffff88009d733b88 <0> ffff88002805be98 ffffffff81087224 ffff88002805be78 ffff88002805be78 <0> ffff88015f808360 00000000000004f6 ffff88002805bea8 ffffffff81010108 Call Trace: <IRQ> [<ffffffff8100e4ae>] drop_other_mm_ref+0x2a/0x53 [<ffffffff81087224>] generic_smp_call_function_single_interrupt+0xd8/0xfc [<ffffffff81010108>] xen_call_function_single_interrupt+0x13/0x28 [<ffffffff810a936a>] handle_IRQ_event+0x66/0x120 [<ffffffff810aac5b>] handle_percpu_irq+0x41/0x6e [<ffffffff8128c1c0>] __xen_evtchn_do_upcall+0x1ab/0x27d [<ffffffff8128dd11>] xen_evtchn_do_upcall+0x33/0x46 [<ffffffff81013efe>] xen_do_hyper visor_callback+0x1e/0x30 <EOI> [<ffffffff814472b2>] ? _spin_unlock_irqrestore+0x15/0x17 [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff81113f71>] ? flush_old_exec+0x3ac/0x500 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef [<ffffffff8115115d>] ? load_elf_binary+0x398/0x17ef [<ffffffff81042fcf>] ? need_resched+0x23/0x2d [<ffffffff811f4648>] ? process_measurement+0xc0/0xd7 [<ffffffff81150dc5>] ? load_elf_binary+0x0/0x17ef [<ffffffff81113094>] ? search_binary_handler+0xc8/0x255 [<ffffffff81114362>] ? do_execve+0x1c3/0x29e [<ffffffff8101155d>] ? sys_execve+0x43/0x5d [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f [<ffffffff81013e28>] ? kernel_execve+0x68/0xd0 [<ffffffff 8106fc45>] ? __call_usermodehelper+0x0/0x6f [<ffffffff8100f8cf>] ? xen_restore_fl_direct_end+0x0/0x1 [<ffffffff8106fb64>] ? ____call_usermodehelper+0x113/0x11e [<ffffffff81013daa>] ? child_rip+0xa/0x20 [<ffffffff8106fc45>] ? __call_usermodehelper+0x0/0x6f [<ffffffff81012f91>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff8101371d>] ? retint_restore_args+0x5/0x6 [<ffffffff81013da0>] ? child_rip+0x0/0x20 Code: 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 e8 17 ff ff ff c9 c3 55 48 89 e5 0f 1f 44 00 00 65 8b 04 25 c8 55 01 00 ff c8 75 04 <0f> 0b eb fe 65 48 8b 34 25 c0 55 01 00 48 81 c6 b8 02 00 00 e8 RIP [<ffffffff8103a3cb>] leave_mm+0x15/0x46 RSP <ffff88002805be48> ---[ end trace ce9cee6832a9c503 ]--- Tested-by: Maoxiaoyun<tinnycloud@hotmail.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> [v1: Fleshed out the git description a bit] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23	x86, amd: Use _safe() msr access for GartTlbWlk disable code	Roedel, Joerg
	commit d47cc0db8fd6011de2248df505fc34990b7451bf upstream. The workaround for Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=33012 introduced a read and a write to the MC4 mask msr. Unfortunatly this MSR is not emulated by the KVM hypervisor so that the kernel will get a #GP and crashes when applying this workaround when running inside KVM. This issue was reported as: https://bugzilla.kernel.org/show_bug.cgi?id=35132 and is fixed with this patch. The change just let the kernel ignore any #GP it gets while accessing this MSR by using the _safe msr access methods. Reported-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23	x86, amd: Do not enable ARAT feature on AMD processors below family 0x12	Boris Ostrovsky
	commit e9cdd343a5e42c43bcda01e609fa23089e026470 upstream. Commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 added support for ARAT (Always Running APIC timer) on AMD processors that are not affected by erratum 400. This erratum is present on certain processor families and prevents APIC timer from waking up the CPU when it is in a deep C state, including C1E state. Determining whether a processor is affected by this erratum may have some corner cases and handling these cases is somewhat complicated. In the interest of simplicity we won't claim ARAT support on processor families below 0x12 and will go back to broadcasting timer when going idle. Signed-off-by: Boris Ostrovsky <ostr@amd64.org> Link: http://lkml.kernel.org/r/1306423192-19774-1-git-send-email-ostr@amd64.org Tested-by: Boris Petkov <borislav.petkov@amd.com> Cc: Hans Rosenfeld <Hans.Rosenfeld@amd.com> Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23	x86, 64-bit: Fix copy_[to/from]_user() checks for the userspace address limit	Jiri Olsa
	commit 26afb7c661080ae3f1f13ddf7f0c58c4f931c22b upstream. As reported in BZ #30352: https://bugzilla.kernel.org/show_bug.cgi?id=30352 there's a kernel bug related to reading the last allowed page on x86_64. The _copy_to_user() and _copy_from_user() functions use the following check for address limit: if (buf + size >= limit) fail(); while it should be more permissive: if (buf + size > limit) fail(); That's because the size represents the number of bytes being read/write from/to buf address AND including the buf address. So the copy function will actually never touch the limit address even if "buf + size == limit". Following program fails to use the last page as buffer due to the wrong limit check: #include <sys/mman.h> #include <sys/socket.h> #include <assert.h> #define PAGE_SIZE (4096) #define LAST_PAGE ((void)(0x7fffffffe000)) int main() { int fds[2], err; void ptr = mmap(LAST_PAGE, PAGE_SIZE, PROT_READ \| PROT_WRITE, MAP_ANONYMOUS \| MAP_PRIVATE \| MAP_FIXED, -1, 0); assert(ptr == LAST_PAGE); err = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds); assert(err == 0); err = send(fds[0], ptr, PAGE_SIZE, 0); perror("send"); assert(err == PAGE_SIZE); err = recv(fds[1], ptr, PAGE_SIZE, MSG_WAITALL); perror("recv"); assert(err == PAGE_SIZE); return 0; } The other place checking the addr limit is the access_ok() function, which is working properly. There's just a misleading comment for the __range_not_ok() macro - which this patch fixes as well. The last page of the user-space address range is a guard page and Brian Gerst observed that the guard page itself due to an erratum on K8 cpus (#121 Sequential Execution Across Non-Canonical Boundary Causes Processor Hang). However, the test code is using the last valid page before the guard page. The bug is that the last byte before the guard page can't be read because of the off-by-one error. The guard page is left in place. This bug would normally not show up because the last page is part of the process stack and never accessed via syscalls. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Brian Gerst <brgerst@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1305210630-7136-1-git-send-email-jolsa@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23	powerpc/oprofile: Handle events that raise an exception without overflowing	Eric B Munson
	commit ad5d5292f16c6c1d7d3e257c4c7407594286b97e upstream. Commit 0837e3242c73566fc1c0196b4ec61779c25ffc93 fixes a situation on POWER7 where events can roll back if a specualtive event doesn't actually complete. This can raise a performance monitor exception. We need to catch this to ensure that we reset the PMC. In all cases the PMC will be less than 256 cycles from overflow. This patch lifts Anton's fix for the problem in perf and applies it to oprofile as well. Signed-off-by: Eric B Munson <emunson@mgebm.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-06-23	powerpc/kexec: Fix memory corruption from unallocated slaves	Milton Miller
	commit 3d2cea732d68aa270c360f55d8669820ebce188a upstream. Commit 1fc711f7ffb01089efc58042cfdbac8573d1b59a (powerpc/kexec: Fix race in kexec shutdown) moved the write to signal the cpu had exited the kernel from before the transition to real mode in kexec_smp_wait to kexec_wait. Unfornately it missed that kexec_wait is used both by cpus leaving the kernel and by secondary slave cpus that were not allocated a paca for what ever reason -- they could be beyond nr_cpus or not described in the current device tree for whatever reason (for example, kexec-load was not refreshed after a cpu hotplug operation). Cpus coming through that path they will write to paca[NR_CPUS] which is beyond the space allocated for the paca data and overwrite memory not allocated to pacas but very likely still real mode accessable). Move the write back to kexec_smp_wait, which is used only by cpus that found their paca, but after the transition to real mode. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23	x86, mce, AMD: Fix leaving freed data in a list	Julia Lawall
	commit d9a5ac9ef306eb5cc874f285185a15c303c50009 upstream. b may be added to a list, but is not removed before being freed in the case of an error. This is done in the corresponding deallocation function, so the code here has been changed to follow that. The sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E,E1,E2; identifier l; @@ list_add(&E->l,E1); ... when != E1 when != list_del(&E->l) when != list_del_init(&E->l) when != E = E2 kfree(E);// </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Link: http://lkml.kernel.org/r/1305294731-12127-1-git-send-email-julia@diku.dk Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23	x86, apic: Fix spurious error interrupts triggering on all non-boot APs	Youquan Song
	commit e503f9e4b092e2349a9477a333543de8f3c7f5d9 upstream. This patch fixes a bug reported by a customer, who found that many unreasonable error interrupts reported on all non-boot CPUs (APs) during the system boot stage. According to Chapter 10 of Intel Software Developer Manual Volume 3A, Local APIC may signal an illegal vector error when an LVT entry is set as an illegal vector value (0~15) under FIXED delivery mode (bits 8-11 is 0), regardless of whether the mask bit is set or an interrupt actually happen. These errors are seen as error interrupts. The initial value of thermal LVT entries on all APs always reads 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI sequence to them and LVT registers are reset to 0s except for the mask bits which are set to 1s when APs receive INIT IPI. When the BIOS takes over the thermal throttling interrupt, the LVT thermal deliver mode should be SMI and it is required from the kernel to keep AP's LVT thermal monitoring register programmed as such as well. This issue happens when BIOS does not take over thermal throttling interrupt, AP's LVT thermal monitor register will be restored to 0x10000 which means vector 0 and fixed deliver mode, so all APs will signal illegal vector error interrupts. This patch check if interrupt delivery mode is not fixed mode before restoring AP's LVT thermal monitor register. Signed-off-by: Youquan Song <youquan.song@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yong Wang <yong.y.wang@intel.com> Cc: hpa@linux.intel.com Cc: joe@perches.com Cc: jbaron@redhat.com Cc: trenn@suse.de Cc: kent.liu@intel.com Cc: chaohong.guo@intel.com Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23	x86, AMD: Fix ARAT feature setting again	Borislav Petkov
	commit 14fb57dccb6e1defe9f89a66f548fcb24c374c1d upstream. Trying to enable the local APIC timer on early K8 revisions uncovers a number of other issues with it, in conjunction with the C1E enter path on AMD. Fixing those causes much more churn and troubles than the benefit of using that timer brings so don't enable it on K8 at all, falling back to the original functionality the kernel had wrt to that. Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com> Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Nick Bowler <nbowler@elliptictech.com> Cc: Joerg-Volker-Peetz <jvpeetz@web.de> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23	Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors"	Borislav Petkov
	commit 328935e6348c6a7cb34798a68c326f4b8372e68a upstream. This reverts commit e20a2d205c05cef6b5783df339a7d54adeb50962, as it crashes certain boxes with specific AMD CPU models. Moving the lower endpoint of the Erratum 400 check to accomodate earlier K8 revisions (A-E) opens a can of worms which is simply not worth to fix properly by tweaking the errata checking framework: * missing IntPenging MSR on revisions < CG cause #GP: http://marc.info/?l=linux-kernel&m=130541471818831 * makes earlier revisions use the LAPIC timer instead of the C1E idle routine which switches to HPET, thus not waking up in deeper C-states: http://lkml.org/lkml/2011/4/24/20 Therefore, leave the original boundary starting with K8-revF. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09	KVM: x86: Fix kvmclock bug	Zachary Amsden
	(backported from commit 28e4639adf0c9f26f6bb56149b7ab547bf33bb95) If preempted after kvmclock values are updated, but before hardware virtualization is entered, the last tsc time as read by the guest is never set. It underflows the next time kvmclock is updated if there has not yet been a successful entry / exit into hardware virt. Fix this by simply setting last_tsc to the newly read tsc value so that any computed nsec advance of kvmclock is nulled. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> BugLink: http://bugs.launchpad.net/bugs/714335 Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Reviewed-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09	KVM: x86: Fix a possible backwards warp of kvmclock	Zachary Amsden
	(backported from commit 1d5f066e0b63271b67eac6d3752f8aa96adcbddb) Kernel time, which advances in discrete steps may progress much slower than TSC. As a result, when kvmclock is adjusted to a new base, the apparent time to the guest, which runs at a much higher, nsec scaled rate based on the current TSC, may have already been observed to have a larger value (kernel_ns + scaled tsc) than the value to which we are setting it (kernel_ns + 0). We must instead compute the clock as potentially observed by the guest for kernel_ns to make sure it does not go backwards. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> BugLink: http://bugs.launchpad.net/bugs/714335 Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Reviewed-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09	x86: pvclock: Move scale_delta into common header	Zachary Amsden
	(cherry-picked from commit 347bb4448c2155eb2310923ccaa4be5677649003) The scale_delta function for shift / multiply with 31-bit precision moves to a common header so it can be used by both kernel and kvm module. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> BugLink: http://bugs.launchpad.net/bugs/714335 Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>