aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2011-05-23x86, mce, AMD: Fix leaving freed data in a listJulia Lawall
commit d9a5ac9ef306eb5cc874f285185a15c303c50009 upstream. b may be added to a list, but is not removed before being freed in the case of an error. This is done in the corresponding deallocation function, so the code here has been changed to follow that. The sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E,E1,E2; identifier l; @@ *list_add(&E->l,E1); ... when != E1 when != list_del(&E->l) when != list_del_init(&E->l) when != E = E2 *kfree(E);// </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Link: http://lkml.kernel.org/r/1305294731-12127-1-git-send-email-julia@diku.dk Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23x86, apic: Fix spurious error interrupts triggering on all non-boot APsYouquan Song
commit e503f9e4b092e2349a9477a333543de8f3c7f5d9 upstream. This patch fixes a bug reported by a customer, who found that many unreasonable error interrupts reported on all non-boot CPUs (APs) during the system boot stage. According to Chapter 10 of Intel Software Developer Manual Volume 3A, Local APIC may signal an illegal vector error when an LVT entry is set as an illegal vector value (0~15) under FIXED delivery mode (bits 8-11 is 0), regardless of whether the mask bit is set or an interrupt actually happen. These errors are seen as error interrupts. The initial value of thermal LVT entries on all APs always reads 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI sequence to them and LVT registers are reset to 0s except for the mask bits which are set to 1s when APs receive INIT IPI. When the BIOS takes over the thermal throttling interrupt, the LVT thermal deliver mode should be SMI and it is required from the kernel to keep AP's LVT thermal monitoring register programmed as such as well. This issue happens when BIOS does not take over thermal throttling interrupt, AP's LVT thermal monitor register will be restored to 0x10000 which means vector 0 and fixed deliver mode, so all APs will signal illegal vector error interrupts. This patch check if interrupt delivery mode is not fixed mode before restoring AP's LVT thermal monitor register. Signed-off-by: Youquan Song <youquan.song@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Yong Wang <yong.y.wang@intel.com> Cc: hpa@linux.intel.com Cc: joe@perches.com Cc: jbaron@redhat.com Cc: trenn@suse.de Cc: kent.liu@intel.com Cc: chaohong.guo@intel.com Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23x86, AMD: Fix ARAT feature setting againBorislav Petkov
commit 14fb57dccb6e1defe9f89a66f548fcb24c374c1d upstream. Trying to enable the local APIC timer on early K8 revisions uncovers a number of other issues with it, in conjunction with the C1E enter path on AMD. Fixing those causes much more churn and troubles than the benefit of using that timer brings so don't enable it on K8 at all, falling back to the original functionality the kernel had wrt to that. Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com> Cc: Boris Ostrovsky <Boris.Ostrovsky@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Nick Bowler <nbowler@elliptictech.com> Cc: Joerg-Volker-Peetz <jvpeetz@web.de> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1305636919-31165-3-git-send-email-bp@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-23Revert "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors"Borislav Petkov
commit 328935e6348c6a7cb34798a68c326f4b8372e68a upstream. This reverts commit e20a2d205c05cef6b5783df339a7d54adeb50962, as it crashes certain boxes with specific AMD CPU models. Moving the lower endpoint of the Erratum 400 check to accomodate earlier K8 revisions (A-E) opens a can of worms which is simply not worth to fix properly by tweaking the errata checking framework: * missing IntPenging MSR on revisions < CG cause #GP: http://marc.info/?l=linux-kernel&m=130541471818831 * makes earlier revisions use the LAPIC timer instead of the C1E idle routine which switches to HPET, thus not waking up in deeper C-states: http://lkml.org/lkml/2011/4/24/20 Therefore, leave the original boundary starting with K8-revF. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09KVM: x86: Fix kvmclock bugZachary Amsden
(backported from commit 28e4639adf0c9f26f6bb56149b7ab547bf33bb95) If preempted after kvmclock values are updated, but before hardware virtualization is entered, the last tsc time as read by the guest is never set. It underflows the next time kvmclock is updated if there has not yet been a successful entry / exit into hardware virt. Fix this by simply setting last_tsc to the newly read tsc value so that any computed nsec advance of kvmclock is nulled. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> BugLink: http://bugs.launchpad.net/bugs/714335 Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Reviewed-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09KVM: x86: Fix a possible backwards warp of kvmclockZachary Amsden
(backported from commit 1d5f066e0b63271b67eac6d3752f8aa96adcbddb) Kernel time, which advances in discrete steps may progress much slower than TSC. As a result, when kvmclock is adjusted to a new base, the apparent time to the guest, which runs at a much higher, nsec scaled rate based on the current TSC, may have already been observed to have a larger value (kernel_ns + scaled tsc) than the value to which we are setting it (kernel_ns + 0). We must instead compute the clock as potentially observed by the guest for kernel_ns to make sure it does not go backwards. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> BugLink: http://bugs.launchpad.net/bugs/714335 Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Reviewed-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09x86: pvclock: Move scale_delta into common headerZachary Amsden
(cherry-picked from commit 347bb4448c2155eb2310923ccaa4be5677649003) The scale_delta function for shift / multiply with 31-bit precision moves to a common header so it can be used by both kernel and kvm module. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> BugLink: http://bugs.launchpad.net/bugs/714335 Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-05-09x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processorsBoris Ostrovsky
commit e20a2d205c05cef6b5783df339a7d54adeb50962 upstream. Older AMD K8 processors (Revisions A-E) are affected by erratum 400 (APIC timer interrupts don't occur in C states greater than C1). This, for example, means that X86_FEATURE_ARAT flag should not be set for these parts. This addresses regression introduced by commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 ("x86, AMD: Set ARAT feature on AMD processors") where the system may become unresponsive until external interrupt (such as keyboard input) occurs. This results, for example, in time not being reported correctly, lack of progress on the system and other lockups. Reported-by: Joerg-Volker Peetz <jvpeetz@web.de> Tested-by: Joerg-Volker Peetz <jvpeetz@web.de> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Boris Ostrovsky <Boris.Ostrovsky@amd.com> Link: http://lkml.kernel.org/r/1304113663-6586-1-git-send-email-ostr@amd64.org Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22x86, cpu: Fix regression in AMD errata checking codeHans Rosenfeld
commit 07a7795ca2e6e66d00b184efb46bd0e23d90d3fe upstream. A bug in the family-model-stepping matching code caused the presence of errata to go undetected when OSVW was not used. This causes hangs on some K8 systems because the E400 workaround is not enabled. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> LKML-Reference: <1282141190-930137-1-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22x86, amd: Disable GartTlbWlkErr when BIOS forgets itJoerg Roedel
commit 5bbc097d890409d8eff4e3f1d26f11a9d6b7c07e upstream. This patch disables GartTlbWlk errors on AMD Fam10h CPUs if the BIOS forgets to do is (or is just too old). Letting these errors enabled can cause a sync-flood on the CPU causing a reboot. The AMD BKDG recommends disabling GART TLB Wlk Error completely. This patch is the fix for https://bugzilla.kernel.org/show_bug.cgi?id=33012 on my machine. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Link: http://lkml.kernel.org/r/20110415131152.GJ18463@8bytes.org Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22x86, AMD: Set ARAT feature on AMD processorsBoris Ostrovsky
commit b87cf80af3ba4b4c008b4face3c68d604e1715c6 upstream. Support for Always Running APIC timer (ARAT) was introduced in commit db954b5898dd3ef3ef93f4144158ea8f97deb058. This feature allows us to avoid switching timers from LAPIC to something else (e.g. HPET) and go into timer broadcasts when entering deep C-states. AMD processors don't provide a CPUID bit for that feature but they also keep APIC timers running in deep C-states (except for cases when the processor is affected by erratum 400). Therefore we should set ARAT feature bit on AMD CPUs. Tested-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Mark Langsdorf <mark.langsdorf@amd.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> LKML-Reference: <1300205624-4813-1-git-send-email-ostr@amd64.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22x86, cpu: Clean up AMD erratum 400 workaroundHans Rosenfeld
commit 9d8888c2a214aece2494a49e699a097c2ba9498b upstream. Remove check_c1e_idle() and use the new AMD errata checking framework instead. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> LKML-Reference: <1280336972-865982-2-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22x86, cpu: AMD errata checking frameworkHans Rosenfeld
commit d78d671db478eb8b14c78501c0cee1cc7baf6967 upstream. Errata are defined using the AMD_LEGACY_ERRATUM() or AMD_OSVW_ERRATUM() macros. The latter is intended for newer errata that have an OSVW id assigned, which it takes as first argument. Both take a variable number of family-specific model-stepping ranges created by AMD_MODEL_RANGE(). Iff an erratum has an OSVW id, OSVW is available on the CPU, and the OSVW id is known to the hardware, it is used to determine whether an erratum is present. Otherwise, the model-stepping ranges are matched against the current CPU to find out whether the erratum applies. For certain special errata, the code using this framework might have to conduct further checks to make sure an erratum is really (not) present. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> LKML-Reference: <1280336972-865982-1-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-22x86: Fix a bogus unwind annotation in lib/semaphore_32.SJan Beulich
commit e938c287ea8d977e079f07464ac69923412663ce upstream. 'simple' would have required specifying current frame address and return address location manually, but that's obviously not the case (and not necessary) here. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4D6D1082020000780003454C@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14Revert "x86: Cleanup highmap after brk is concluded"Greg Kroah-Hartman
This reverts upstream commit e5f15b45ddf3afa2bbbb10c7ea34fb32b6de0a0e It caused problems in the stable tree and should not have been there. Cc: Yinghai Lu <yinghai@kernel.org> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14x86, microcode, AMD: Extend ucode size verificationBorislav Petkov
Upstream commit: 44d60c0f5c58c2168f31df9a481761451840eb54 The different families have a different max size for the ucode patch, adjust size checking to the family we're running on. Also, do not vzalloc the max size of the ucode but only the actual size that is passed on from the firmware loader. Cc: <stable@kernel.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14x86, amd-ucode: Remove needless log messagesAndreas Herrmann
Upstream commit: 6e18da75c28b592594fd632cf3e6eb09d3d078de Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091029134742.GD30802@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-04-14x86, mtrr, pat: Fix one cpu getting out of sync during resumeSuresh Siddha
commit 84ac7cdbdd0f04df6b96153f7a79127fd6e45467 upstream. On laptops with core i5/i7, there were reports that after resume graphics workloads were performing poorly on a specific AP, while the other cpu's were ok. This was observed on a 32bit kernel specifically. Debug showed that the PAT init was not happening on that AP during resume and hence it contributing to the poor workload performance on that cpu. On this system, resume flow looked like this: 1. BP starts the resume sequence and we reinit BP's MTRR's/PAT early on using mtrr_bp_restore() 2. Resume sequence brings all AP's online 3. Resume sequence now kicks off the MTRR reinit on all the AP's. 4. For some reason, between point 2 and 3, we moved from BP to one of the AP's. My guess is that printk() during resume sequence is contributing to this. We don't see similar behavior with the 64bit kernel but there is no guarantee that at this point the remaining resume sequence (after AP's bringup) has to happen on BP. 5. set_mtrr() was assuming that we are still on BP and skipped the MTRR/PAT init on that cpu (because of 1 above) 6. But we were on an AP and this led to not reprogramming PAT on this cpu leading to bad performance. Fix this by doing unconditional mtrr_if->set_all() in set_mtrr() during MTRR/PAT init. This might be unnecessary if we are still running on BP. But it is of no harm and will guarantee that after resume, all the cpu's will be in sync with respect to the MTRR/PAT registers. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1301438292-28370-1-git-send-email-eric@anholt.net> Signed-off-by: Eric Anholt <eric@anholt.net> Tested-by: Keith Packard <keithp@keithp.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-27x86: Cleanup highmap after brk is concludedYinghai Lu
commit e5f15b45ddf3afa2bbbb10c7ea34fb32b6de0a0e upstream. Now cleanup_highmap actually is in two steps: one is early in head64.c and only clears above _end; a second one is in init_memory_mapping() and tries to clean from _brk_end to _end. It should check if those boundaries are PMD_SIZE aligned but currently does not. Also init_memory_mapping() is called several times for numa or memory hotplug, so we really should not handle initial kernel mappings there. This patch moves cleanup_highmap() down after _brk_end is settled so we can do everything in one step. Also we honor max_pfn_mapped in the implementation of cleanup_highmap. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-27xen: set max_pfn_mapped to the last pfn mappedStefano Stabellini
commit 14988a4d350ce3b41ecad4f63c4f44c56f5ae34d upstream. Do not set max_pfn_mapped to the end of the initial memory mappings, that also contain pages that don't belong in pfn space (like the mfn list). Set max_pfn_mapped to the last real pfn mapped in the initial memory mappings that is the pfn backing _end. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-23x86, binutils, xen: Fix another wrong size directiveAlexander van Heukelum
commit 371c394af27ab7d1e58a66bc19d9f1f3ac1f67b4 upstream. The latest binutils (2.21.0.20110302/Ubuntu) breaks the build yet another time, under CONFIG_XEN=y due to a .size directive that refers to a slightly differently named (hence, to the now very strict and unforgiving assembler, non-existent) symbol. [ mingo: This unnecessary build breakage caused by new binutils version 2.21 gets escallated back several kernel releases spanning several years of Linux history, affecting over 130,000 upstream kernel commits (!), on CONFIG_XEN=y 64-bit kernels (i.e. essentially affecting all major Linux distro kernel configs). Git annotate tells us that this slight debug symbol code mismatch bug has been introduced in 2008 in commit 3d75e1b8: 3d75e1b8 (Jeremy Fitzhardinge 2008-07-08 15:06:49 -0700 1231) ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs) The 'bug' is just a slight assymetry in ENTRY()/END() debug-symbols sequences, with lots of assembly code between the ENTRY() and the END(): ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct *pt_regs) ... END(do_hypervisor_callback) Human reviewers almost never catch such small mismatches, and binutils never even warned about it either. This new binutils version thus breaks the Xen build on all upstream kernels since v2.6.27, out of the blue. This makes a straightforward Git bisection of all 64-bit Xen-enabled kernels impossible on such binutils, for a bisection window of over hundred thousand historic commits. (!) This is a major fail on the side of binutils and binutils needs to turn this show-stopper build failure into a warning ASAP. ] Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Jan Beulich <jbeulich@novell.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1299877178-26063-1-git-send-email-heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-23x86: Flush TLB if PGD entry is changed in i386 PAE modeShaohua Li
commit 4981d01eada5354d81c8929d5b2836829ba3df7b upstream. According to intel CPU manual, every time PGD entry is changed in i386 PAE mode, we need do a full TLB flush. Current code follows this and there is comment for this too in the code. But current code misses the multi-threaded case. A changed page table might be used by several CPUs, every such CPU should flush TLB. Usually this isn't a problem, because we prepopulate all PGD entries at process fork. But when the process does munmap and follows new mmap, this issue will be triggered. When it happens, some CPUs keep doing page faults: http://marc.info/?l=linux-kernel&m=129915020508238&w=2 Reported-by: Yasunori Goto<y-goto@jp.fujitsu.com> Tested-by: Yasunori Goto<y-goto@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Shaohua Li<shaohua.li@intel.com> Cc: Mallick Asit K <asit.k.mallick@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm <linux-mm@kvack.org> LKML-Reference: <1300246649.2337.95.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-23x86, quirk: Fix SB600 revision checkAndreas Herrmann
commit 1d3e09a304e6c4e004ca06356578b171e8735d3c upstream. Commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12 (x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems) introduced a regression. It removed some SB600 specific code to determine the revision ID without adapting a corresponding revision ID check for SB600. See this mail thread: http://marc.info/?l=linux-kernel&m=129980296006380&w=2 This patch adapts the corresponding check to cover all SB600 revisions. Tested-by: Wang Lei <f3d27b@gmail.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20110315143137.GD29499@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-23x86: Emit "mem=nopentium ignored" warning when not supportedKamal Mostafa
commit 9a6d44b9adb777ca9549e88cd55bd8f2673c52a2 upstream. Emit warning when "mem=nopentium" is specified on any arch other than x86_32 (the only that arch supports it). Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1296783486-23033-2-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-23x86: Fix panic when handling "mem={invalid}" paramKamal Mostafa
commit 77eed821accf5dd962b1f13bed0680e217e49112 upstream. Avoid removing all of memory and panicing when "mem={invalid}" is specified, e.g. mem=blahblah, mem=0, or mem=nopentium (on platforms other than x86_32). Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1296783486-23033-1-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-23x86/mm: Handle mm_fault_error() in kernel spaceAndrey Vagin
commit f86268549f424f83b9eb0963989270e14fbfc3de upstream. mm_fault_error() should not execute oom-killer, if page fault occurs in kernel space. E.g. in copy_from_user()/copy_to_user(). This would happen if we find ourselves in OOM on a copy_to_user(), or a copy_from_user() which faults. Without this patch, the kernels hangs up in copy_from_user(), because OOM killer sends SIG_KILL to current process, but it can't handle a signal while in syscall, then the kernel returns to copy_from_user(), reexcute current command and provokes page_fault again. With this patch the kernel return -EFAULT from copy_from_user(). The code, which checks that page fault occurred in kernel space, has been copied from do_sigbus(). This situation is handled by the same way on powerpc, xtensa, tile, ... Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <201103092322.p29NMNPH001682@imap1.linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-07x86: Use u32 instead of long to set reset vector back to 0Don Zickus
commit 299c56966a72b9109d47c71a6db52097098703dd upstream. A customer of ours, complained that when setting the reset vector back to 0, it trashed other data and hung their box. They noticed when only 4 bytes were set to 0 instead of 8, everything worked correctly. Mathew pointed out: | | We're supposed to be resetting trampoline_phys_low and | trampoline_phys_high here, which are two 16-bit values. | Writing 64 bits is definitely going to overwrite space | that we're not supposed to be touching. | So limit the area modified to u32. Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Matthew Garrett <mjg@redhat.com> LKML-Reference: <1297139100-424-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systemsAndreas Herrmann
commit 7f74f8f28a2bd9db9404f7d364e2097a0c42cc12 upstream. On some SB800 systems polarity for IOAPIC pin2 is wrongly specified as low active by BIOS. This caused system hangs after resume from S3 when HPET was used in one-shot mode on such systems because a timer interrupt was missed (HPET signal is high active). For more details see: http://marc.info/?l=linux-kernel&m=129623757413868 Tested-by: Manoj Iyer <manoj.iyer@canonical.com> Tested-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20110224145346.GD3658@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02x86/pvclock: Zero last_value on resumeJeremy Fitzhardinge
commit e7a3481c0246c8e45e79c629efd63b168e91fcda upstream. If the guest domain has been suspend/resumed or migrated, then the system clock backing the pvclock clocksource may revert to a smaller value (ie, can be non-monotonic across the migration/save-restore). Make sure we zero last_value in that case so that the domain continues to see clock updates. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-03-02x86, hpet: Disable per-cpu hpet timer if ARAT is supportedShaohua Li
commit 39fe05e58c5e448601ce46e6b03900d5bf31c4b0 upstream. If CPU support always running local APIC timer, per-cpu hpet timer could be disabled, which is useless and wasteful in such case. Let's leave the timers to others. The effect is that we reserve less timers. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: venkatesh.pallipadi@intel.com LKML-Reference: <20090812031612.GA10062@sli10-desk.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Renninger <trenn@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17x86: Add IRQ_TIME_ACCOUNTINGVenkatesh Pallipadi
Commit: e82b8e4ea4f3dffe6e7939f90e78da675fcc450e upstream This patch adds IRQ_TIME_ACCOUNTING option on x86 and runtime enables it when TSC is enabled. This change just enables fine grained irq time accounting, isn't used yet. Following patches use it for different purposes. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-6-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after ↵Suresh Siddha
switching mm commit 831d52bc153971b70e64eccfbed2b232394f22f8 upstream. Clearing the cpu in prev's mm_cpumask early will avoid the flush tlb IPI's while the cr3 is still pointing to the prev mm. And this window can lead to the possibility of bogus TLB fills resulting in strange failures. One such problematic scenario is mentioned below. T1. CPU-1 is context switching from mm1 to mm2 context and got a NMI etc between the point of clearing the cpu from the mm_cpumask(mm1) and before reloading the cr3 with the new mm2. T2. CPU-2 is tearing down a specific vma for mm1 and will proceed with flushing the TLB for mm1. It doesn't send the flush TLB to CPU-1 as it doesn't see that cpu listed in the mm_cpumask(mm1). T3. After the TLB flush is complete, CPU-2 goes ahead and frees the page-table pages associated with the removed vma mapping. T4. CPU-2 now allocates those freed page-table pages for something else. T5. As the CR3 and TLB caches for mm1 is still active on CPU-1, CPU-1 can potentially speculate and walk through the page-table caches and can insert new TLB entries. As the page-table pages are already freed and being used on CPU-2, this page walk can potentially insert a bogus global TLB entry depending on the (random) contents of the page that is being used on CPU-2. T6. This bogus TLB entry being global will be active across future CR3 changes and can result in weird memory corruption etc. To avoid this issue, for the prev mm that is handing over the cpu to another mm, clear the cpu from the mm_cpumask(prev) after the cr3 is changed. Marking it for -stable, though we haven't seen any reported failure that can be attributed to this. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-02-17x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platformsSuresh Siddha
commit f7448548a9f32db38f243ccd4271617758ddfe2c upstream. Markus Kohn ran into a hard hang regression on an acer aspire 1310, when acpi is enabled. git bisect showed the following commit as the bad one that introduced the boot regression. commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3 Author: Suresh Siddha <suresh.b.siddha@intel.com> Date: Wed Aug 19 18:05:36 2009 -0700 x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init Because of the UP configuration of that platform, native_smp_prepare_cpus() bailed out (in smp_sanity_check()) before doing the set_mtrr_aps_delayed_init() Further down the boot path, native_smp_cpus_done() will call the delayed MTRR initialization for the AP's (mtrr_aps_init()) with mtrr_aps_delayed_init not set. This resulted in the boot processor reprogramming its MTRR's to the values seen during the start of the OS boot. While this is not needed ideally, this shouldn't have caused any side-effects. This is because the reprogramming of MTRR's (set_mtrr_state() that gets called via set_mtrr()) will check if the live register contents are different from what is being asked to write and will do the actual write only if they are different. BP's mtrr state is read during the start of the OS boot and typically nothing would have changed when we ask to reprogram it on BP again because of the above scenario on an UP platform. So on a normal UP platform no reprogramming of BP MTRR MSR's happens and all is well. However, on this platform, bios seems to be modifying the fixed mtrr range registers between the start of OS boot and when we double check the live registers for reprogramming BP MTRR registers. And as the live registers are modified, we end up reprogramming the MTRR's to the state seen during the start of the OS boot. During ACPI initialization, something in the bios (probably smi handler?) don't like this fact and results in a hard lockup. We didn't see this boot hang issue on this platform before the commit d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3, because only the AP's (if any) will program its MTRR's to the value that BP had at the start of the OS boot. Fix this issue by checking mtrr_aps_delayed_init before continuing further in the mtrr_aps_init(). Now, only AP's (if any) will program its MTRR's to the BP values during boot. Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393 [ By the way, this behavior of the bios modifying MTRR's after the start of the OS boot is not common and the kernel is not prepared to handle this situation well. Irrespective of this issue, during suspend/resume, linux kernel will try to reprogram the BP's MTRR values to the values seen during the start of the OS boot. So suspend/resume might be already broken on this platform for all linux kernel versions. ] Reported-and-bisected-by: Markus Kohn <jabber@gmx.org> Tested-by: Markus Kohn <jabber@gmx.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Thomas Renninger <trenn@novell.com> Cc: Rafael Wysocki <rjw@novell.com> Cc: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic modeKenji Kaneshige
commit 086e8ced65d9bcc4a8e8f1cd39b09640f2883f90 upstream. In x2apic mode, we need to set the upper address register of the fault handling interrupt register of the vt-d hardware. Without this irq migration of the vt-d fault handling interrupt is broken. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <1291225233.2648.39.camel@sbsiddha-MOBL3> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07x86: Enable the intr-remap fault handling after local APIC setupKenji Kaneshige
commit 7f7fbf45c6b748074546f7f16b9488ca71de99c1 upstream. Interrupt-remapping gets enabled very early in the boot, as it determines the apic mode that the processor can use. And the current code enables the vt-d fault handling before the setup_local_APIC(). And hence the APIC LDR registers and data structure in the memory may not be initialized. So the vt-d fault handling in logical xapic/x2apic modes were broken. Fix this by enabling the vt-d fault handling in the end_local_APIC_setup() A cleaner fix of enabling fault handling while enabling intr-remapping will be addressed for v2.6.38. [ Enabling intr-remapping determines the usage of x2apic mode and the apic mode determines the fault-handling configuration. ] Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <20101201062244.541996375@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07x86, gcc-4.6: Use gcc -m options when building vdsoH. Peter Anvin
commit de2a8cf98ecdde25231d6c5e7901e2cffaf32af9 upstream. The vdso Makefile passes linker-style -m options not to the linker but to gcc. This happens to work with earlier gcc, but fails with gcc 4.6. Pass gcc-style -m options, instead. Note: all currently supported versions of gcc supports -m32, so there is no reason to conditionalize it any more. Reported-by: H. J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <tip-*@git.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07x86, amd: Fix panic on AMD CPU family 0x15Andreas Herrmann
[The mainline kernel doesn't have this problem. Commit "(23588c3) x86, amd: Add support for CPUID topology extension of AMD CPUs" removed the family check. But 2.6.32.y needs to be fixed.] This CPU family check is not required -- existence of the NodeId MSR is indicated by a CPUID feature flag which is already checked in amd_fixup_dcm() -- and it needlessly prevents amd_fixup_dcm() to be called for newer AMD CPUs. In worst case this can lead to a panic in the scheduler code for AMD family 0x15 multi-node AMD CPUs. I just have a picture of VGA console output so I can't copy-and-paste it herein, but the call stack of such a panic looked like: do_divide_error ... find_busiest_group run_rebalance_domains ... apic_timer_interrupt ... cpu_idle The mainline kernel doesn't have this problem. Commit "(23588c3) x86, amd: Add support for CPUID topology extension of AMD CPUs" removed the family check. But 2.6.32.y needs to be fixed. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-01-07x86, hotplug: Use mwait to offline a processor, fix the legacy caseH. Peter Anvin
upstream ea53069231f9317062910d6e772cca4ce93de8c8 x86, hotplug: Use mwait to offline a processor, fix the legacy case Here included also some small follow-on patches to the same code: upstream a68e5c94f7d3dd64fef34dd5d97e365cae4bb42a x86, hotplug: Move WBINVD back outside the play_dead loop upstream ce5f68246bf2385d6174856708d0b746dc378f20 x86, hotplug: In the MWAIT case of play_dead, CLFLUSH the cache line https://bugzilla.kernel.org/show_bug.cgi?id=5471 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09x86-32: Fix dummy trampoline-related inline stubsH. Peter Anvin
commit 8848a91068c018bc91f597038a0f41462a0f88a4 upstream. Fix dummy inline stubs for trampoline-related functions when no trampolines exist (until we get rid of the no-trampoline case entirely.) Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <4C6C294D.3030404@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampolineHugh Dickins
commit b7d460897739e02f186425b7276e3fdb1595cea7 upstream. rc2 kernel crashes when booting second cpu on this CONFIG_VMSPLIT_2G_OPT laptop: whereas cloning from kernel to low mappings pgd range does need to limit by both KERNEL_PGD_PTRS and KERNEL_PGD_BOUNDARY, cloning kernel pgd range itself must not be limited by the smaller KERNEL_PGD_BOUNDARY. Signed-off-by: Hugh Dickins <hughd@google.com> LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09x86-32: Separate 1:1 pagetables from swapper_pg_dirJoerg Roedel
commit fd89a137924e0710078c3ae855e7cec1c43cb845 upstream. This patch fixes machine crashes which occur when heavily exercising the CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by AMD Erratum 383 and result in a fatal machine check exception. Here's the scenario: 1. On 32-bit, the swapper_pg_dir page table is used as the initial page table for booting a secondary CPU. 2. To make this work, swapper_pg_dir needs a direct mapping of physical memory in it (the low mappings). By adding those low, large page (2M) mappings (PAE kernel), we create the necessary conditions for Erratum 383 to occur. 3. Other CPUs which do not participate in the off- and onlining game may use swapper_pg_dir while the low mappings are present (when leave_mm is called). For all steps below, the CPU referred to is a CPU that is using swapper_pg_dir, and not the CPU which is being onlined. 4. The presence of the low mappings in swapper_pg_dir can result in TLB entries for addresses below __PAGE_OFFSET to be established speculatively. These TLB entries are marked global and large. 5. When the CPU with such TLB entry switches to another page table, this TLB entry remains because it is global. 6. The process then generates an access to an address covered by the above TLB entry but there is a permission mismatch - the TLB entry covers a large global page not accessible to userspace. 7. Due to this permission mismatch a new 4kb, user TLB entry gets established. Further, Erratum 383 provides for a small window of time where both TLB entries are present. This results in an uncorrectable machine check exception signalling a TLB multimatch which panics the machine. There are two ways to fix this issue: 1. Always do a global TLB flush when a new cr3 is loaded and the old page table was swapper_pg_dir. I consider this a hack hard to understand and with performance implications 2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit does. This patch implements solution 2. It introduces a trampoline_pg_dir which has the same layout as swapper_pg_dir with low_mappings. This page table is used as the initial page table of the booting CPU. Later in the bringup process, it switches to swapper_pg_dir and does a global TLB flush. This fixes the crashes in our test cases. -v2: switch to swapper_pg_dir right after entering start_secondary() so that we are able to access percpu data which might not be mapped in the trampoline page table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> LKML-Reference: <20100816123833.GB28147@aftab> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09x86, UV: Fix initialization of max_pnodeJack Steiner
commit 36ac4b987bea9a95217e1af552252f275ca7fc44 upstream. Fix calculation of "max_pnode" for systems where the the highest blade has neither cpus or memory. (And, yes, although rare this does occur). Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20100910150808.GA19802@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09x86, UV: Delete unneeded boot messagesJack Steiner
commit 2acebe9ecb2b77876e87a1480729cfb2db4570dd upstream. SGI:UV: Delete extra boot messages that describe the system topology. These messages are no longer useful. Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20100317154038.GA29346@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09acpi-cpufreq: fix a memleak when unloading driverZhang Rui
commit dab5fff14df2cd16eb1ad4c02e83915e1063fece upstream. We didn't free per_cpu(acfreq_data, cpu)->freq_table when acpi_freq driver is unloaded. Resulting in the following messages in /sys/kernel/debug/kmemleak: unreferenced object 0xf6450e80 (size 64): comm "modprobe", pid 1066, jiffies 4294677317 (age 19290.453s) hex dump (first 32 bytes): 00 00 00 00 e8 a2 24 00 01 00 00 00 00 9f 24 00 ......$.......$. 02 00 00 00 00 6a 18 00 03 00 00 00 00 35 0c 00 .....j.......5.. backtrace: [<c123ba97>] kmemleak_alloc+0x27/0x50 [<c109f96f>] __kmalloc+0xcf/0x110 [<f9da97ee>] acpi_cpufreq_cpu_init+0x1ee/0x4e4 [acpi_cpufreq] [<c11cd8d2>] cpufreq_add_dev+0x142/0x3a0 [<c11920b7>] sysdev_driver_register+0x97/0x110 [<c11cce56>] cpufreq_register_driver+0x86/0x140 [<f9dad080>] 0xf9dad080 [<c1001130>] do_one_initcall+0x30/0x160 [<c10626e9>] sys_init_module+0x99/0x1e0 [<c1002d97>] sysenter_do_call+0x12/0x26 [<ffffffff>] 0xffffffff https://bugzilla.kernel.org/show_bug.cgi?id=15807#c21 Tested-by: Toralf Forster <toralf.foerster@gmx.de> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09KVM: VMX: Fix host userspace gsbase corruptionBen Hutchings
commit c8770e7ba63bb5dd8fe5f9d251275a8fa717fb78 upstream. We now use load_gs_index() to load gs safely; unfortunately this also changes MSR_KERNEL_GS_BASE, which we managed separately. This resulted in confusion and breakage running 32-bit host userspace on a 64-bit kernel. Fix by - saving guest MSR_KERNEL_GS_BASE before we we reload the host's gs - doing the host save/load unconditionally, instead of only when in guest long mode Things can be cleaned up further, but this is the minmal fix for now. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> [bwh: Backport to 2.6.32] Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09KVM: x86: fix information leak to userlandVasiliy Kulikov
commit 97e69aa62f8b5d338d6cff49be09e37cc1262838 upstream. Structures kvm_vcpu_events, kvm_debugregs, kvm_pit_state2 and kvm_clock_data are copied to userland with some padding and reserved fields unitialized. It leads to leaking of contents of kernel stack memory. We have to initialize them to zero. In patch v1 Jan Kiszka suggested to fill reserved fields with zeros instead of memset'ting the whole struct. It makes sense as these fields are explicitly marked as padding. No more fields need zeroing. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09KVM: VMX: fix vmx null pointer dereference on debug register accessGleb Natapov
There is a bug in KVM that can be used to crash a host on Intel machines. If emulator is tricked into emulating mov to/from DR instruction it causes NULL pointer dereference on VMX since kvm_x86_ops->(set|get)_dr are not initialized. Recently this is not exploitable from guest userspace, but malicious guest kernel can trigger it easily. CVE-2010-0435 On upstream bug was fixed differently around 2.6.34. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09x86: AMD Northbridge: Verify NB's node is onlinePrarit Bhargava
commit 303fc0870f8fbfabe260c5c32b18e53458d597ea upstream. Fix panic seen on some IBM and HP systems on 2.6.32-rc6: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8120bf3f>] find_next_bit+0x77/0x9c [...] [<ffffffff8120bbde>] cpumask_next_and+0x2e/0x3b [<ffffffff81225c62>] pci_device_probe+0x8e/0xf5 [<ffffffff812b9be6>] ? driver_sysfs_add+0x47/0x6c [<ffffffff812b9da5>] driver_probe_device+0xd9/0x1f9 [<ffffffff812b9f1d>] __driver_attach+0x58/0x7c [<ffffffff812b9ec5>] ? __driver_attach+0x0/0x7c [<ffffffff812b9298>] bus_for_each_dev+0x54/0x89 [<ffffffff812b9b4f>] driver_attach+0x19/0x1b [<ffffffff812b97ae>] bus_add_driver+0xd3/0x23d [<ffffffff812ba1e7>] driver_register+0x98/0x109 [<ffffffff81225ed0>] __pci_register_driver+0x63/0xd3 [<ffffffff81072776>] ? up_read+0x26/0x2a [<ffffffffa0081000>] ? k8temp_init+0x0/0x20 [k8temp] [<ffffffffa008101e>] k8temp_init+0x1e/0x20 [k8temp] [<ffffffff8100a073>] do_one_initcall+0x6d/0x185 [<ffffffff8108d765>] sys_init_module+0xd3/0x236 [<ffffffff81011ac2>] system_call_fastpath+0x16/0x1b I put in a printk and commented out the set_dev_node() call when and got this output: quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x0 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x1 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x2 quirk_amd_nb_node: current numa_node = 0x0, would set to val & 7 = 0x3 I.e. the issue appears to be that the HW has set val to a valid value, however, the system is only configured for a single node -- 0, the others are offline. Check to see if the node is actually online before setting the numa node for an AMD northbridge in quirk_amd_nb_node(). Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: bhavna.sarathy@amd.com Cc: jbarnes@virtuousgeek.org Cc: andreas.herrmann3@amd.com LKML-Reference: <20091112180933.12532.98685.sendpatchset@prarit.bos.redhat.com> [ v2: clean up the code and add comments ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-12-09xen: don't bother to stop other cpus on shutdown/rebootJeremy Fitzhardinge
commit 31e323cca9d5c8afd372976c35a5d46192f540d1 upstream. Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's no need to do it manually. In any case it will fail because all the IPI irqs have been pulled down by this point, so the cross-CPU calls will simply hang forever. Until change 76fac077db6b34e2c6383a7b4f3f4f7b7d06d8ce the function calls were not synchronously waited for, so this wasn't apparent. However after that change the calls became synchronous leading to a hang on shutdown on multi-VCPU guests. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Alok Kataria <akataria@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-11-22KVM: Fix fs/gs reload oops with invalid ldtAvi Kivity
commit 9581d442b9058d3699b4be568b6e5eae38a41493 upstream kvm reloads the host's fs and gs blindly, however the underlying segment descriptors may be invalid due to the user modifying the ldt after loading them. Fix by using the safe accessors (loadsegment() and load_gs_index()) instead of home grown unsafe versions. This is CVE-2010-3698. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>