aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2014-07-17x86, ioremap: Speed up check for RAM pagesRoland Dreier
commit c81c8a1eeede61e92a15103748c23d100880cc8a upstream. In __ioremap_caller() (the guts of ioremap), we loop over the range of pfns being remapped and checks each one individually with page_is_ram(). For large ioremaps, this can be very slow. For example, we have a device with a 256 GiB PCI BAR, and ioremapping this BAR can take 20+ seconds -- sometimes long enough to trigger the soft lockup detector! Internally, page_is_ram() calls walk_system_ram_range() on a single page. Instead, we can make a single call to walk_system_ram_range() from __ioremap_caller(), and do our further checks only for any RAM pages that we find. For the common case of MMIO, this saves an enormous amount of work, since the range being ioremapped doesn't intersect system RAM at all. With this change, ioremap on our 256 GiB BAR takes less than 1 second. Signed-off-by: Roland Dreier <roland@purestorage.com> Link: http://lkml.kernel.org/r/1399054721-1331-1-git-send-email-roland@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09KVM: x86: preserve the high 32-bits of the PAT registerPaolo Bonzini
commit 7cb060a91c0efc5ff94f83c6df3ed705e143cdb9 upstream. KVM does not really do much with the PAT, so this went unnoticed for a long time. It is exposed however if you try to do rdmsr on the PAT register. Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-09KVM: x86: Increase the number of fixed MTRR regs to 10Nadav Amit
commit 682367c494869008eb89ef733f196e99415ae862 upstream. Recent Intel CPUs have 10 variable range MTRRs. Since operating systems sometime make assumptions on CPUs while they ignore capability MSRs, it is better for KVM to be consistent with recent CPUs. Reporting more MTRRs than actually supported has no functional implications. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-07-06ptrace,x86: force IRET path after a ptrace_stop()Tejun Heo
commit b9cd18de4db3c9ffa7e17b0dc0ca99ed5aa4d43a upstream. The 'sysret' fastpath does not correctly restore even all regular registers, much less any segment registers or reflags values. That is very much part of why it's faster than 'iret'. Normally that isn't a problem, because the normal ptrace() interface catches the process using the signal handler infrastructure, which always returns with an iret. However, some paths can get caught using ptrace_event() instead of the signal path, and for those we need to make sure that we aren't going to return to user space using 'sysret'. Otherwise the modifications that may have been done to the register set by the tracer wouldn't necessarily take effect. Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from arch_ptrace_stop_needed() which is invoked from ptrace_stop(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)Andy Lutomirski
commit 554086d85e71f30abe46fc014fea31929a7c6a8a upstream. The bad syscall nr paths are their own incomprehensible route through the entry control flow. Rearrange them to work just like syscalls that return -ENOSYS. This fixes an OOPS in the audit code when fast-path auditing is enabled and sysenter gets a bad syscall nr (CVE-2014-4508). This has probably been broken since Linux 2.6.27: af0575bba0 i386 syscall audit fast-path Cc: Roland McGrath <roland@redhat.com> Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/e09c499eade6fc321266dd6b54da7beb28d6991c.1403558229.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30x86, x32: Use compat shims for io_{setup,submit}Mike Frysinger
commit 7fd44dacdd803c0bbf38bf478d51d280902bb0f1 upstream. The io_setup takes a pointer to a context id of type aio_context_t. This in turn is typed to a __kernel_ulong_t. We could tweak the exported headers to define this as a 64bit quantity for specific ABIs, but since we already have a 32bit compat shim for the x86 ABI, let's just re-use that logic. The libaio package is also written to expect this as a pointer type, so a compat shim would simplify that. The io_submit func operates on an array of pointers to iocb structs. Padding out the array to be 64bit aligned is a huge pain, so convert it over to the existing compat shim too. We don't convert io_getevents to the compat func as its only purpose is to handle the timespec struct, and the x32 ABI uses 64bit times. With this change, the libaio package can now pass its testsuite when built for the x32 ABI. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Link: http://lkml.kernel.org/r/1399250595-5005-1-git-send-email-vapier@gentoo.org Cc: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30x86-32, espfix: Remove filter for espfix32 due to raceH. Peter Anvin
commit 246f2d2ee1d715e1077fc47d61c394569c8ee692 upstream. It is not safe to use LAR to filter when to go down the espfix path, because the LDT is per-process (rather than per-thread) and another thread might change the descriptors behind our back. Fortunately it is always *safe* (if a bit slow) to go down the espfix path, and a 32-bit LDT stack segment is extremely rare. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86/efi: Fix dummy variable buffer allocationBen Hutchings
commit b8cb62f82103083a6e8fa5470bfe634a2c06514d upstream. 1. Check for allocation failure 2. Clear the buffer contents, as they may actually be written to flash 3. Don't leak the buffer Compile-tested only. [ Tested successfully on my buggy ASUS machine - Matt ] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07Modify UEFI anti-bricking codeMatthew Garrett
commit f8b8404337de4e2466e2e1139ea68b1f8295974f upstream. This patch reworks the UEFI anti-bricking code, including an effective reversion of cc5a080c and 31ff2f20. It turns out that calling QueryVariableInfo() from boot services results in some firmware implementations jumping to physical addresses even after entering virtual mode, so until we have 1:1 mappings for UEFI runtime space this isn't going to work so well. Reverting these gets us back to the situation where we'd refuse to create variables on some systems because they classify deleted variables as "used" until the firmware triggers a garbage collection run, which they won't do until they reach a lower threshold. This results in it being impossible to install a bootloader, which is unhelpful. Feedback from Samsung indicates that the firmware doesn't need more than 5KB of storage space for its own purposes, so that seems like a reasonable threshold. However, there's still no guarantee that a platform will attempt garbage collection merely because it drops below this threshold. It seems that this is often only triggered if an attempt to write generates a genuine EFI_OUT_OF_RESOURCES error. We can force that by attempting to create a variable larger than the remaining space. This should fail, but if it somehow succeeds we can then immediately delete it. I've tested this on the UEFI machines I have available, but I don't have a Samsung and so can't verify that it avoids the bricking problem. Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Lee, Chun-Y <jlee@suse.com> [ dummy variable cleanup ] Signed-off-by: Matt Fleming <matt.fleming@intel.com> [bwh: Backported to 3.2: the reverted changes were never applied here] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86,efi: Implement efi_no_storage_paranoia parameterRichard Weinberger
commit 8c58bf3eec3b8fc8162fe557e9361891c20758f2 upstream. Using this parameter one can disable the storage_size/2 check if he is really sure that the UEFI does sane gc and fulfills the spec. This parameter is useful if a devices uses more than 50% of the storage by default. The Intel DQSW67 desktop board is such a sucker for exmaple. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Matt Fleming <matt.fleming@intel.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07efi: Export efi_query_variable_store() for efivars.koSergey Vlasov
commit 3668011d4ad556224f7c012c1e870a6eaa0e59da upstream. Fixes build with CONFIG_EFI_VARS=m which was broken after the commit "x86, efivars: firmware bug workarounds should be in platform code". Signed-off-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86,efi: Check max_size only if it is non-zero.Richard Weinberger
commit 7791c8423f1f7f4dad94e753bae67461d5b80be8 upstream. Some EFI implementations return always a MaximumVariableSize of 0, check against max_size only if it is non-zero. My Intel DQ67SW desktop board has such an implementation. Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86, efivars: firmware bug workarounds should be in platform codeMatt Fleming
commit a6e4d5a03e9e3587e88aba687d8f225f4f04c792 upstream. Let's not burden ia64 with checks in the common efivars code that we're not writing too much data to the variable store. That kind of thing is an x86 firmware bug, plain and simple. efi_query_variable_store() provides platforms with a wrapper in which they can perform checks and workarounds for EFI variable storage bugs. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> [xr: Backported to 3.4: adjust context] Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86/Sandy Bridge: mark arrays in __init functions as __initconstH. Peter Anvin
commit 91c90db1aa92a50fa1d7f289502b49ddb46a90d3 upstream. commit ab3cd8670e0b3fcde7f029e1503ed3c5138e9571 upstream. Mark static arrays as __initconst so they get removed when the init sections are flushed. Reported-by: Mathias Krause <minipli@googlemail.com> Link: http://lkml.kernel.org/r/75F4BEE6-CB0E-4426-B40B-697451677738@googlemail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07ptrace/x86: Partly fix set_task_blockstep()->update_debugctlmsr() logicOleg Nesterov
commit 95cf00fa5d5e2a200a2c044c84bde8389a237e02 upstream. Afaics the usage of update_debugctlmsr() and TIF_BLOCKSTEP in step.c was always very wrong. 1. update_debugctlmsr() was simply unneeded. The child sleeps TASK_TRACED, __switch_to_xtra(next_p => child) should notice TIF_BLOCKSTEP and set/clear DEBUGCTLMSR_BTF after resume if needed. 2. It is wrong. The state of DEBUGCTLMSR_BTF bit in CPU register should always match the state of current's TIF_BLOCKSTEP bit. 3. Even get_debugctlmsr() + update_debugctlmsr() itself does not look right. Irq can change other bits in MSR_IA32_DEBUGCTLMSR register or the caller can be preempted in between. 4. It is not safe to play with TIF_BLOCKSTEP if task != current. DEBUGCTLMSR_BTF and TIF_BLOCKSTEP should always match each other if the task is running. The tracee is stopped but it can be SIGKILL'ed right before set/clear_tsk_thread_flag(). However, now that uprobes uses user_enable_single_step(current) we can't simply remove update_debugctlmsr(). So this patch adds the additional "task == current" check and disables irqs to avoid the race with interrupts/preemption. Unfortunately this patch doesn't solve the last problem, we need another fix. Probably we should teach ptrace_stop() to set/clear single/block stepping after resume. And afaics there is yet another problem: perf can play with MSR_IA32_DEBUGCTLMSR from nmi, this obviously means that even __switch_to_xtra() has problems. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07ptrace/x86: Introduce set_task_blockstep() helperOleg Nesterov
commit 848e8f5f0ad3169560c516fff6471be65f76e69f upstream. No functional changes, preparation for the next fix and for uprobes single-step fixes. Move the code playing with TIF_BLOCKSTEP/DEBUGCTLMSR_BTF into the new helper, set_task_blockstep(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86 get_unmapped_area: Access mmap_legacy_base through mm_struct memberRadu Caragea
commit 41aacc1eea645c99edbe8fbcf78a97dc9b862adc upstream. This is the updated version of df54d6fa5427 ("x86 get_unmapped_area(): use proper mmap base for bottom-up direction") that only randomizes the mmap base address once. Signed-off-by: Radu Caragea <sinaelgl@gmail.com> Reported-and-tested-by: Jeff Shorey <shoreyjeff@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Adrian Sendroiu <molecula2788@gmail.com> Cc: Greg KH <greg@kroah.com> Cc: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86: fix build error and kconfig for ia32_emulation and binfmtRandy Dunlap
commit d1603990ea626668c78527376d9ec084d634202d upstream. Fix kconfig warning and build errors on x86_64 by selecting BINFMT_ELF when COMPAT_BINFMT_ELF is being selected. warning: (IA32_EMULATION) selects COMPAT_BINFMT_ELF which has unmet direct dependencies (COMPAT && BINFMT_ELF) fs/built-in.o: In function `elf_core_dump': compat_binfmt_elf.c:(.text+0x3e093): undefined reference to `elf_core_extra_phdrs' compat_binfmt_elf.c:(.text+0x3ebcd): undefined reference to `elf_core_extra_data_size' compat_binfmt_elf.c:(.text+0x3eddd): undefined reference to `elf_core_write_extra_phdrs' compat_binfmt_elf.c:(.text+0x3f004): undefined reference to `elf_core_write_extra_data' [ hpa: This was sent to me for -next but it is a low risk build fix ] Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: http://lkml.kernel.org/r/51C0B614.5000708@infradead.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86/apic: Disable I/O APIC before shutdown of the local APICFenghua Yu
commit 522e66464467543c0d88d023336eec4df03ad40b upstream. In reboot and crash path, when we shut down the local APIC, the I/O APIC is still active. This may cause issues because external interrupts can still come in and disturb the local APIC during shutdown process. To quiet external interrupts, disable I/O APIC before shutdown local APIC. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1382578212-4677-1-git-send-email-fenghua.yu@intel.com [ I suppose the 'issue' is a hang during shutdown. It's a fine change nevertheless. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86, build: Pass in additional -mno-mmx, -mno-sse optionsH. Peter Anvin
commit 8b3b005d675726e38bc504d2e35a991e55819155 upstream. In checkin 5551a34e5aea x86-64, build: Always pass in -mno-sse we unconditionally added -mno-sse to the main build, to keep newer compilers from generating SSE instructions from autovectorization. However, this did not extend to the special environments (arch/x86/boot, arch/x86/boot/compressed, and arch/x86/realmode/rm). Add -mno-sse to the compiler command line for these environments, and add -mno-mmx to all the environments as well, as we don't want a compiler to generate MMX code either. This patch also removes a $(cc-option) call for -m32, since we have long since stopped supporting compilers too old for the -m32 option, and in fact hardcode it in other places in the Makefiles. Reported-by: Kevin B. Smith <kevin.b.smith@intel.com> Cc: Sunil K. Pandey <sunil.k.pandey@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: H. J. Lu <hjl.tools@gmail.com> Link: http://lkml.kernel.org/n/tip-j21wzqv790q834n7yc6g80j1@git.kernel.org [bwh: Backported to 3.2: - Drop changes to arch/x86/Makefile, which sets these flags earlier - Adjust context - Drop changes to arch/x86/realmode/rm/Makefile which doesn't exist] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86-64, modify_ldt: Make support for 16-bit segments a runtime optionLinus Torvalds
commit fa81511bb0bbb2b1aace3695ce869da9762624ff upstream. Checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels disabled 16-bit segments on 64-bit kernels due to an information leak. However, it does seem that people are genuinely using Wine to run old 16-bit Windows programs on Linux. A proper fix for this ("espfix64") is coming in the upcoming merge window, but as a temporary fix, create a sysctl to allow the administrator to re-enable support for 16-bit segments. It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If you hit this issue and care about your old Windows program more than you care about a kernel stack address information leak, you can do echo 1 > /proc/sys/abi/ldt16 as root (add it to your startup scripts), and you should be ok. The sysctl table is only added if you have COMPAT support enabled on x86-64, but I assume anybody who runs old windows binaries very much does that ;) Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()Anthony Iliopoulos
commit 9844f5462392b53824e8b86726e7c33b5ecbb676 upstream. The invalidation is required in order to maintain proper semantics under CoW conditions. In scenarios where a process clones several threads, a thread operating on a core whose DTLB entry for a particular hugepage has not been invalidated, will be reading from the hugepage that belongs to the forked child process, even after hugetlb_cow(). The thread will not see the updated page as long as the stale DTLB entry remains cached, the thread attempts to write into the page, the child process exits, or the thread gets migrated to a different processor. Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com> Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp Suggested-by: Shay Goikhman <shay.goikhman@huawei.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-13crypto: ghash-clmulni-intel - use C implementation for setkey()Ard Biesheuvel
commit 8ceee72808d1ae3fb191284afc2257a2be964725 upstream. The GHASH setkey() function uses SSE registers but fails to call kernel_fpu_begin()/kernel_fpu_end(). Instead of adding these calls, and then having to deal with the restriction that they cannot be called from interrupt context, move the setkey() implementation to the C domain. Note that setkey() does not use any particular SSE features and is not expected to become a performance bottleneck. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Fixes: 0e1227d356e9b (crypto: ghash - Add PCLMULQDQ accelerated implementation) Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernelsH. Peter Anvin
commit b3b42ac2cbae1f3cecbb6229964a4d48af31d382 upstream. The IRET instruction, when returning to a 16-bit segment, only restores the bottom 16 bits of the user space stack pointer. We have a software workaround for that ("espfix") for the 32-bit kernel, but it relies on a nonzero stack segment base which is not available in 32-bit mode. Since 16-bit support is somewhat crippled anyway on a 64-bit kernel (no V86 mode), and most (if not quite all) 64-bit processors support virtualization for the users who really need it, simply reject attempts at creating a 16-bit segment when running on top of a 64-bit kernel. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-kicdm89kzw9lldryb1br9od0@git.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14compiler/gcc4: Add quirk for 'asm goto' miscompilation bugIngo Molnar
commit 3f0116c3238a96bc18ad4b4acefe4e7be32fa861 upstream. Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto' constructs, as outlined here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 Implement a workaround suggested by Jakub Jelinek. Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Suggested-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> [hq: Backported to 3.4: Adjust context] Signed-off-by: Qiang Huang <h.huangqiang@huawei.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-03x86: fix boot on uniprocessor systemsArtem Fetishev
commit 825600c0f20e595daaa7a6dd8970f84fa2a2ee57 upstream. On x86 uniprocessor systems topology_physical_package_id() returns -1 which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized which leads to GPF in rapl_pmu_init(). See arch/x86/kernel/cpu/perf_event_intel_rapl.c. It turns out that physical_package_id and core_id can actually be retreived for uniprocessor systems too. Enabling them also fixes rapl_pmu code. Signed-off-by: Artem Fetishev <artem_fetishev@epam.com> Cc: Stephane Eranian <eranian@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-30KVM: VMX: fix use after free of vmx->loaded_vmcsMarcelo Tosatti
commit 26a865f4aa8e66a6d94958de7656f7f1b03c6c56 upstream. After free_loaded_vmcs executes, the "loaded_vmcs" structure is kfreed, and now vmx->loaded_vmcs points to a kfreed area. Subsequent free_loaded_vmcs then attempts to manipulate vmx->loaded_vmcs. Switch the order to avoid the problem. https://bugzilla.redhat.com/show_bug.cgi?id=1047892 Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-30KVM: MMU: handle invalid root_hpa at __direct_mapMarcelo Tosatti
commit 989c6b34f6a9480e397b170cc62237e89bf4fdb9 upstream. It is possible for __direct_map to be called on invalid root_hpa (-1), two examples: 1) try_async_pf -> can_do_async_pf -> vmx_interrupt_allowed -> nested_vmx_vmexit 2) vmx_handle_exit -> vmx_interrupt_allowed -> nested_vmx_vmexit Then to load_vmcs12_host_state and kvm_mmu_reset_context. Check for this possibility, let fault exception be regenerated. BZ: https://bugzilla.redhat.com/show_bug.cgi?id=924916 Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-30x86: bpf_jit: support negative offsetsAlexei Starovoitov
commit fdfaf64e75397567257e1051931f9a3377360665 upstream. Commit a998d4342337 claimed to introduce negative offset support to x86 jit, but it couldn't be working, since at the time of the execution of LD+ABS or LD+IND instructions via call into bpf_internal_load_pointer_neg_helper() the %edx (3rd argument of this func) had junk value instead of access size in bytes (1 or 2 or 4). Store size into %edx instead of %ecx (what original commit intended to do) Fixes: a998d4342337 ("bpf jit: Let the x86 jit handle negative offsets") Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Jan Seiffert <kaffeemonster@googlemail.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23KVM: SVM: fix cr8 intercept windowRadim Krčmář
commit 596f3142d2b7be307a1652d59e7b93adab918437 upstream. We always disable cr8 intercept in its handler, but only re-enable it if handling KVM_REQ_EVENT, so there can be a window where we do not intercept cr8 writes, which allows an interrupt to disrupt a higher priority task. Fix this by disabling intercepts in the same function that re-enables them when needed. This fixes BSOD in Windows 2008. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23x86/amd/numa: Fix northbridge quirk to assign correct NUMA nodeDaniel J Blueman
commit 847d7970defb45540735b3fb4e88471c27cacd85 upstream. For systems with multiple servers and routed fabric, all northbridges get assigned to the first server. Fix this by also using the node reported from the PCI bus. For single-fabric systems, the northbriges are on PCI bus 0 by definition, which are on NUMA node 0 by definition, so this is invarient on most systems. Tested on fam10h and fam15h single and multi-fabric systems and candidate for stable. Signed-off-by: Daniel J Blueman <daniel@numascale.com> Acked-by: Steffen Persvold <sp@numascale.com> Acked-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1394710981-3596-1-git-send-email-daniel@numascale.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-11xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU ↵Konrad Rzeszutek Wilk
online/offline commit 66ff0fe9e7bda8aec99985b24daad03652f7304e upstream. While we don't use the spinlock interrupt line (see for details commit f10cd522c5fbfec9ae3cc01967868c9c2401ed23 - xen: disable PV spinlocks on HVM) - we should still do the proper init / deinit sequence. We did not do that correctly and for the CPU init for PVHVM guest we would allocate an interrupt line - but failed to deallocate the old interrupt line. This resulted in leakage of an irq_desc but more importantly this splat as we online an offlined CPU: genirq: Flags mismatch irq 71. 0002cc20 (spinlock1) vs. 0002cc20 (spinlock1) Pid: 2542, comm: init.late Not tainted 3.9.0-rc6upstream #1 Call Trace: [<ffffffff811156de>] __setup_irq+0x23e/0x4a0 [<ffffffff81194191>] ? kmem_cache_alloc_trace+0x221/0x250 [<ffffffff811161bb>] request_threaded_irq+0xfb/0x160 [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20 [<ffffffff813a8423>] bind_ipi_to_irqhandler+0xa3/0x160 [<ffffffff81303758>] ? kasprintf+0x38/0x40 [<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20 [<ffffffff810cad35>] ? update_max_interval+0x15/0x40 [<ffffffff816605db>] xen_init_lock_cpu+0x3c/0x78 [<ffffffff81660029>] xen_hvm_cpu_notify+0x29/0x33 [<ffffffff81676bdd>] notifier_call_chain+0x4d/0x70 [<ffffffff810bb2a9>] __raw_notifier_call_chain+0x9/0x10 [<ffffffff8109402b>] __cpu_notify+0x1b/0x30 [<ffffffff8166834a>] _cpu_up+0xa0/0x14b [<ffffffff816684ce>] cpu_up+0xd9/0xec [<ffffffff8165f754>] store_online+0x94/0xd0 [<ffffffff8141d15b>] dev_attr_store+0x1b/0x20 [<ffffffff81218f44>] sysfs_write_file+0xf4/0x170 [<ffffffff811a2864>] vfs_write+0xb4/0x130 [<ffffffff811a302a>] sys_write+0x5a/0xa0 [<ffffffff8167ada9>] system_call_fastpath+0x16/0x1b cpu 1 spinlock event irq -16 smpboot: Booting Node 0 Processor 1 APIC 0x2 And if one looks at the /proc/interrupts right after offlining (CPU1): 70: 0 0 xen-percpu-ipi spinlock0 71: 0 0 xen-percpu-ipi spinlock1 77: 0 0 xen-percpu-ipi spinlock2 There is the oddity of the 'spinlock1' still being present. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-11xen/smp: Fix leakage of timer interrupt line for every CPU online/offline.Konrad Rzeszutek Wilk
commit 888b65b4bc5e7fcbbb967023300cd5d44dba1950 upstream. In the PVHVM path when we do CPU online/offline path we would leak the timer%d IRQ line everytime we do a offline event. The online path (xen_hvm_setup_cpu_clockevents via x86_cpuinit.setup_percpu_clockev) would allocate a new interrupt line for the timer%d. But we would still use the old interrupt line leading to: kernel BUG at /home/konrad/ssd/konrad/linux/kernel/hrtimer.c:1261! invalid opcode: 0000 [#1] SMP RIP: 0010:[<ffffffff810b9e21>] [<ffffffff810b9e21>] hrtimer_interrupt+0x261/0x270 .. snip.. <IRQ> [<ffffffff810445ef>] xen_timer_interrupt+0x2f/0x1b0 [<ffffffff81104825>] ? stop_machine_cpu_stop+0xb5/0xf0 [<ffffffff8111434c>] handle_irq_event_percpu+0x7c/0x240 [<ffffffff811175b9>] handle_percpu_irq+0x49/0x70 [<ffffffff813a74a3>] __xen_evtchn_do_upcall+0x1c3/0x2f0 [<ffffffff813a760a>] xen_evtchn_do_upcall+0x2a/0x40 [<ffffffff8167c26d>] xen_hvm_callback_vector+0x6d/0x80 <EOI> [<ffffffff81666d01>] ? start_secondary+0x193/0x1a8 [<ffffffff81666cfd>] ? start_secondary+0x18f/0x1a8 There is also the oddity (timer1) in the /proc/interrupts after offlining CPU1: 64: 1121 0 xen-percpu-virq timer0 78: 0 0 xen-percpu-virq timer1 84: 0 2483 xen-percpu-virq timer2 This patch fixes it. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-11xen/boot: Disable BIOS SMP MP table search.Konrad Rzeszutek Wilk
commit bd49940a35ec7d488ae63bd625639893b3385b97 upstream. As the initial domain we are able to search/map certain regions of memory to harvest configuration data. For all low-level we use ACPI tables - for interrupts we use exclusively ACPI _PRT (so DSDT) and MADT for INT_SRC_OVR. The SMP MP table is not used at all. As a matter of fact we do not even support machines that only have SMP MP but no ACPI tables. Lets follow how Moorestown does it and just disable searching for BIOS SMP tables. This also fixes an issue on HP Proliant BL680c G5 and DL380 G6: 9f->100 for 1:1 PTE Freeing 9f-100 pfn range: 97 pages freed 1-1 mapping on 9f->100 .. snip.. e820: BIOS-provided physical RAM map: Xen: [mem 0x0000000000000000-0x000000000009efff] usable Xen: [mem 0x000000000009f400-0x00000000000fffff] reserved Xen: [mem 0x0000000000100000-0x00000000cfd1dfff] usable .. snip.. Scan for SMP in [mem 0x00000000-0x000003ff] Scan for SMP in [mem 0x0009fc00-0x0009ffff] Scan for SMP in [mem 0x000f0000-0x000fffff] found SMP MP-table at [mem 0x000f4fa0-0x000f4faf] mapped at [ffff8800000f4fa0] (XEN) mm.c:908:d0 Error getting mfn 100 (pfn 5555555555555555) from L1 entry 0000000000100461 for l1e_owner=0, pg_owner=0 (XEN) mm.c:4995:d0 ptwr_emulate: could not get_page_from_l1e() BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81ac07e2>] xen_set_pte_init+0x66/0x71 . snip.. Pid: 0, comm: swapper Not tainted 3.6.0-rc6upstream-00188-gb6fb969-dirty #2 HP ProLiant BL680c G5 .. snip.. Call Trace: [<ffffffff81ad31c6>] __early_ioremap+0x18a/0x248 [<ffffffff81624731>] ? printk+0x48/0x4a [<ffffffff81ad32ac>] early_ioremap+0x13/0x15 [<ffffffff81acc140>] get_mpc_size+0x2f/0x67 [<ffffffff81acc284>] smp_scan_config+0x10c/0x136 [<ffffffff81acc2e4>] default_find_smp_config+0x36/0x5a [<ffffffff81ac3085>] setup_arch+0x5b3/0xb5b [<ffffffff81624731>] ? printk+0x48/0x4a [<ffffffff81abca7f>] start_kernel+0x90/0x390 [<ffffffff81abc356>] x86_64_start_reservations+0x131/0x136 [<ffffffff81abfa83>] xen_start_kernel+0x65f/0x661 (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. which is that ioremap would end up mapping 0xff using _PAGE_IOMAP (which is what early_ioremap sticks as a flag) - which meant we would get MFN 0xFF (pte ff461, which is OK), and then it would also map 0x100 (b/c ioremap tries to get page aligned request, and it was trying to map 0xf4fa0 + PAGE_SIZE - so it mapped the next page) as _PAGE_IOMAP. Since 0x100 is actually a RAM page, and the _PAGE_IOMAP bypasses the P2M lookup we would happily set the PTE to 1000461. Xen would deny the request since we do not have access to the Machine Frame Number (MFN) of 0x100. The P2M[0x100] is for example 0x80140. Fixes-Oracle-Bugzilla: https://bugzilla.oracle.com/bugzilla/show_bug.cgi?id=13665 Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-11perf/x86: Fix event schedulingPeter Zijlstra
commit 26e61e8939b1fe8729572dabe9a9e97d930dd4f6 upstream. Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures, with perf WARN_ON()s triggering. He also provided traces of the failures. This is I think the relevant bit: > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: x86_pmu_disable > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926156: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926158: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926159: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1 > pec_1076_warn-2804 [000] d... 147.926161: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926162: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926163: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926166: collect_events: Adding event: 1 (ffff880119ec8800) So we add the insn:p event (fd[23]). At this point we should have: n_events = 2, n_added = 1, n_txn = 1 > pec_1076_warn-2804 [000] d... 147.926170: collect_events: Adding event: 0 (ffff8800c9e01800) > pec_1076_warn-2804 [000] d... 147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00) We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]). These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so that's not visible. group_sched_in() pmu->start_txn() /* nop - BP pmu */ event_sched_in() event->pmu->add() So here we should end up with: 0: n_events = 3, n_added = 2, n_txn = 2 4: n_events = 4, n_added = 3, n_txn = 3 But seeing the below state on x86_pmu_enable(), the must have failed, because the 0 and 4 events aren't there anymore. Looking at group_sched_in(), since the BP is the leader, its event_sched_in() must have succeeded, for otherwise we would not have seen the sibling adds. But since neither 0 or 4 are in the below state; their event_sched_in() must have failed; but I don't see why, the complete state: 0,0,1:p,4 fits perfectly fine on a core2. However, since we try and schedule 4 it means the 0 event must have succeeded! Therefore the 4 event must have failed, its failure will have put group_sched_in() into the fail path, which will call: event_sched_out() event->pmu->del() on 0 and the BP event. Now x86_pmu_del() will reduce n_events; but it will not reduce n_added; giving what we see below: n_event = 2, n_added = 2, n_txn = 2 > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_enable: x86_pmu_enable > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926179: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926181: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926182: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2 > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926186: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: 1->0 tag: 1 config: 1 (ffff880119ec8800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0 So the problem is that x86_pmu_del(), when called from a group_sched_in() that fails (for whatever reason), and without x86_pmu TXN support (because the leader is !x86_pmu), will corrupt the n_added state. Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Jones <davej@redhat.com> Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367)Andy Honig
commit b963a22e6d1a266a67e9eecc88134713fd54775c upstream. Under guest controllable circumstances apic_get_tmcct will execute a divide by zero and cause a crash. If the guest cpuid support tsc deadline timers and performs the following sequence of requests the host will crash. - Set the mode to periodic - Set the TMICT to 0 - Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline) - Set the TMICT to non-zero. Then the lapic_timer.period will be 0, but the TMICT will not be. If the guest then reads from the TMCCT then the host will perform a divide by 0. This patch ensures that if the lapic_timer.period is 0, then the division does not occur. Reported-by: Andrew Honig <ahonig@google.com> Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [bwh: Backported to 3.2: s/kvm_apic_get_reg/apic_get_reg/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Vinson Lee <vlee@twopensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-29perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10hRobert Richter
commit bee09ed91cacdbffdbcd3b05de8409c77ec9fcd6 upstream. On AMD family 10h we see following error messages while waking up from S3 for all non-boot CPUs leading to a failed IBS initialization: Enabling non-boot CPUs ... smpboot: Booting Node 0 Processor 1 APIC 0x1 [Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu perf: IBS APIC setup failed on cpu #1 process: Switch to broadcast mode on CPU1 CPU1 is up ... ACPI: Waking up from system sleep state S3 Reason for this is that during suspend the LVT offset for the IBS vector gets lost and needs to be reinialized while resuming. The offset is read from the IBSCTL msr. On family 10h the offset needs to be 1 as offset 0 is used for the MCE threshold interrupt, but firmware assings it for IBS to 0 too. The kernel needs to reprogram the vector. The msr is a readonly node msr, but a new value can be written via pci config space access. The reinitialization is implemented for family 10h in setup_ibs_ctl() which is forced during IBS setup. This patch fixes IBS setup after waking up from S3 by adding resume/supend hooks for the boot cpu which does the offset reinitialization. Marking it as stable to let distros pick up this fix. Signed-off-by: Robert Richter <rric@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1389797849-5565-1-git-send-email-rric.net@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-29KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)Andy Honig
commit fda4e2e85589191b123d31cdc21fd33ee70f50fd upstream. In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the potential to corrupt kernel memory if userspace provides an address that is at the end of a page. This patches concerts those functions to use kvm_write_guest_cached and kvm_read_guest_cached. It also checks the vapic_address specified by userspace during ioctl processing and returns an error to userspace if the address is not a valid GPA. This is generally not guest triggerable, because the required write is done by firmware that runs before the guest. Also, it only affects AMD processors and oldish Intel that do not have the FlexPriority feature (unless you disable FlexPriority, of course; then newer processors are also affected). Fixes: b93463aa59d6 ('KVM: Accelerated apic support') Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [ lizf: backported to 3.4: based on Paolo's backport hints for <3.10 ] Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-15x86, fpu, amd: Clear exceptions in AMD FXSAVE workaroundLinus Torvalds
commit 26bef1318adc1b3a530ecc807ef99346db2aa8b0 upstream. Before we do an EMMS in the AMD FXSAVE information leak workaround we need to clear any pending exceptions, otherwise we trap with a floating-point exception inside this code. Reported-by: halfdog <me@halfdog.net> Tested-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-20x86, efi: Don't use (U)EFI time services on 32 bitMatthew Garrett
commit 04bf9ba720fcc4fa313fa122b799ae0989b6cd50 upstream. UEFI time services are often broken once we're in virtual mode. We were already refusing to use them on 64-bit systems, but it turns out that they're also broken on some 32-bit firmware, including the Dell Venue. Disable them for now, we can revisit once we have the 1:1 mappings code incorporated. Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com> Link: http://lkml.kernel.org/r/1385754283-2464-1-git-send-email-matthew.garrett@nebula.com Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-11-29x86/microcode/amd: Tone down printk(), don't treat a missing firmware file ↵Thomas Renninger
as an error commit 11f918d3e2d3861b6931e97b3aa778e4984935aa upstream. Do it the same way as done in microcode_intel.c: use pr_debug() for missing firmware files. There seem to be CPUs out there for which no microcode update has been submitted to kernel-firmware repo yet resulting in scary sounding error messages in dmesg: microcode: failed to load file amd-ucode/microcode_amd_fam16h.bin Signed-off-by: Thomas Renninger <trenn@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1384274383-43510-1-git-send-email-trenn@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-05x86, efi: Don't map Boot Services on i386Josh Boyer
commit 700870119f49084da004ab588ea2b799689efaf7 upstream. Add patch to fix 32bit EFI service mapping (rhbz 726701) Multiple people are reporting hitting the following WARNING on i386, WARNING: at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x3d3/0x440() Modules linked in: Pid: 0, comm: swapper Not tainted 3.9.0-rc7+ #95 Call Trace: [<c102b6af>] warn_slowpath_common+0x5f/0x80 [<c1023fb3>] ? __ioremap_caller+0x3d3/0x440 [<c1023fb3>] ? __ioremap_caller+0x3d3/0x440 [<c102b6ed>] warn_slowpath_null+0x1d/0x20 [<c1023fb3>] __ioremap_caller+0x3d3/0x440 [<c106007b>] ? get_usage_chars+0xfb/0x110 [<c102d937>] ? vprintk_emit+0x147/0x480 [<c1418593>] ? efi_enter_virtual_mode+0x1e4/0x3de [<c102406a>] ioremap_cache+0x1a/0x20 [<c1418593>] ? efi_enter_virtual_mode+0x1e4/0x3de [<c1418593>] efi_enter_virtual_mode+0x1e4/0x3de [<c1407984>] start_kernel+0x286/0x2f4 [<c1407535>] ? repair_env_string+0x51/0x51 [<c1407362>] i386_start_kernel+0x12c/0x12f Due to the workaround described in commit 916f676f8 ("x86, efi: Retain boot service code until after switching to virtual mode") EFI Boot Service regions are mapped for a period during boot. Unfortunately, with the limited size of the i386 direct kernel map it's possible that some of the Boot Service regions will not be directly accessible, which causes them to be ioremap()'d, triggering the above warning as the regions are marked as E820_RAM in the e820 memmap. There are currently only two situations where we need to map EFI Boot Service regions, 1. To workaround the firmware bug described in 916f676f8 2. To access the ACPI BGRT image but since we haven't seen an i386 implementation that requires either, this simple fix should suffice for now. [ Added to changelog - Matt ] Reported-by: Bryan O'Donoghue <bryan