Age | Commit message (Collapse) | Author |
|
commit 14315592009c17035cac81f4954d5a1f4d71e489 upstream.
Distros generally (I looked at Debian, RHEL5 and SLES11) seem to
enable CONFIG_HIGHPTE for any x86 configuration which has highmem
enabled. This means that the overhead applies even to machines which
have a fairly modest amount of high memory and which therefore do not
really benefit from allocating PTEs in high memory but still pay the
price of the additional mapping operations.
Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but
no actual highptes being allocated there was a reduction in system
time used from 59.737s to 55.9s.
With CONFIG_HIGHPTE=y and highmem PTEs being allocated:
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 175.396 (0.238914)
User Time 515.983 (5.85019)
System Time 59.737 (1.26727)
Percent CPU 263.8 (71.6796)
Context Switches 39989.7 (4672.64)
Sleeps 42617.7 (246.307)
With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated:
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 174.278 (0.831968)
User Time 515.659 (6.07012)
System Time 55.9 (1.07799)
Percent CPU 263.8 (71.266)
Context Switches 39929.6 (4485.13)
Sleeps 42583.7 (373.039)
This patch allows the user to control the allocation of PTEs in
highmem from the command line ("userpte=nohigh") but retains the
status-quo as the default.
It is possible that some simple heuristic could be developed which
allows auto-tuning of this option however I don't have a sufficiently
large machine available to me to perform any particularly meaningful
experiments. We could probably handwave up an argument for a threshold
at 16G of total RAM.
Assuming 768M of lowmem we have 196608 potential lowmem PTE
pages. Each page can map 2M of RAM in a PAE-enabled configuration,
meaning a maximum of 384G of RAM could potentially be mapped using
lowmem PTEs.
Even allowing generous factor of 10 to account for other required
lowmem allocations, generous slop to account for page sharing (which
reduces the total amount of RAM mappable by a given number of PT
pages) and other innacuracies in the estimations it would seem that
even a 32G machine would not have a particularly pressing need for
highmem PTEs. I think 32G could be considered to be at the upper bound
of what might be sensible on a 32 bit machine (although I think in
practice 64G is still supported).
It's seems questionable if HIGHPTE is even a win for any amount of RAM
you would sensibly run a 32 bit kernel on rather than going 64 bit.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
LKML-Reference: <1266403090-20162-1-git-send-email-ian.campbell@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e92805ac1228626c59c865f2f4e9059b9fb8c97b upstream.
Add CPL checking in case emulator is tricked into emulating
privilege instruction from userspace.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 60a29d4ea4e7b6b95d9391ebc8625b0426f3a363 upstream.
Use groups mechanism to decode 0F C7 instructions.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 8b9f44140bc4afd2698413cd9960c3912168ee91 upstream.
Inject #UD if guest attempts to do so. This is in accordance to Intel
SDM.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 2db2c2eb6226e30f8059b82512a1364db98da8e3 upstream.
Use groups mechanism to decode 0F BA instructions.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 18dce6ba5c8c6bd0f3ab4efa4cbdd698dab5c40a upstream.
Thomas Renninger <trenn@suse.de> reported on IBM x3330
booting a latest kernel on this machine results in:
PCI: PCI BIOS revision 2.10 entry at 0xfd61c, last bus=1
PCI: Using configuration type 1 for base access bio: create slab <bio-0> at 0
ACPI: SCI (IRQ30) allocation failed
ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler (20090903/evevent-161)
ACPI: Unable to start the ACPI Interpreter
Later all kind of devices fail...
and bisect it down to this commit:
commit b9c61b70075c87a8612624736faf4a2de5b1ed30
x86/pci: update pirq_enable_irq() to setup io apic routing
it turns out we need to set irq routing for the sci on ioapic1 early.
-v2: make it work without sparseirq too.
-v3: fix checkpatch.pl warning, and cc to stable
Reported-by: Thomas Renninger <trenn@suse.de>
Bisected-by: Thomas Renninger <trenn@suse.de>
Tested-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-2-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit ced5b697a76d325e7a7ac7d382dbbb632c765093 upstream.
Keep chip_data in create_irq_nr and destroy_irq.
When two drivers are setting up MSI-X at the same time via
pci_enable_msix() there is a race. See this dmesg excerpt:
[ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
[ 85.170611] alloc irq_desc for 99 on node -1
[ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
[ 85.170614] alloc kstat_irqs on node -1
[ 85.170616] alloc irq_2_iommu on node -1
[ 85.170617] alloc irq_desc for 100 on node -1
[ 85.170619] alloc kstat_irqs on node -1
[ 85.170621] alloc irq_2_iommu on node -1
[ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
[ 85.170626] alloc irq_desc for 101 on node -1
[ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
[ 85.170630] alloc kstat_irqs on node -1
[ 85.170631] alloc irq_2_iommu on node -1
[ 85.170635] alloc irq_desc for 102 on node -1
[ 85.170636] alloc kstat_irqs on node -1
[ 85.170639] alloc irq_2_iommu on node -1
[ 85.170646] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000088
As you can see igb and ixgbe are both alternating on create_irq_nr()
via pci_enable_msix() in their probe function.
ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
NULL via dynamic_irq_init().
igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:
cfg_new = irq_desc_ptrs[102]->chip_data;
if (cfg_new->vector != 0)
continue;
This hits the NULL deref.
Another possible race exists via pci_disable_msix() in a driver or in
the number of error paths that call free_msi_irqs():
destroy_irq()
dynamic_irq_cleanup() which sets desc->chip_data = NULL
...race window...
desc->chip_data = cfg;
Remove the save and restore code for cfg in create_irq_nr() and
destroy_irq() and take the desc->lock when checking the irq_cfg.
Reported-and-analyzed-by: Brandon Philips <bphilips@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org>
Signed-off-by: Brandon Phililps <bphilips@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 817a824b75b1475f1b067c8cee318c7b4d66fcde upstream.
There's a path in the pagefault code where the kernel deliberately
breaks its own locking rules by kmapping a high pte page without
holding the pagetable lock (in at least page_check_address). This
breaks Xen's ability to track the pinned/unpinned state of the
page. There does not appear to be a viable workaround for this
behaviour so simply disable HIGHPTE for all Xen guests.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
LKML-Reference: <1267204562-11844-1-git-send-email-ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pasi Kärkkäinen <pasik@iki.fi>
Cc: <xen-devel@lists.xensource.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0a832320f1bae6a4169bf683e201378f2437cfc1 upstream.
On the iMac9,1 /sbin/reboot results in a black mangled screen. Adding
this DMI entry gets the machine to reboot cleanly as it should.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
LKML-Reference: <1266362249-3337-1-git-send-email-justinmattock@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 318f6b228ba88a394ef560efc1bfe028ad5ae6b6 upstream.
Do not set current->mm->mmap to NULL in 32-bit emulation on 64-bit
load_aout_binary after flush_old_exec as it would destroy already
set brpm mapping with arguments.
Introduced by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba
mm: variable length argument support
where the argument mapping in bprm was added.
[ hpa: this is a regression from 2.6.22... time to kill a.out? ]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
LKML-Reference: <1265831716-7668-1-git-send-email-jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ollie Wild <aaw@google.com>
Cc: x86@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit cfc9c0b450176a077205ef39092f0dc1a04e020a upstream.
During switching virtual counters there is access to perfctr msrs. If
the counter is not available this fails due to an invalid
address. This patch fixes this.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit c17c8fbf349482e89b57d1b800e83e9f4cf40c47 upstream.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 89baaaa98a10cad5cc8516c7208b02d9fc711890 upstream.
Standard AMD systems have the same number of nodes as there are
northbridge devices. However, there may kernel configurations
(especially for 32 bit) or system setups exist, where the node number
is different or it can not be detected properly. Thus the check is not
reliable and may fail though IBS setup was fine. For this reason it is
better to remove the check.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 68dc819ce829f7e7977a56524e710473bdb55115 upstream.
Multiple virtual counters share one physical counter. The reservation
of virtual counters fails due to duplicate allocation of the same
counter. The counters are already reserved. Thus, virtual counter
reservation may removed at all. This also makes the code easier.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 97c169d39b6846a564dc8d883832e7fef9bdb77d upstream.
We realized when we broke acpi=ht
http://bugzilla.kernel.org/show_bug.cgi?id=14886
that acpi=ht is not needed on this box
and folks have been using acpi=force on it anyway.
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 37ef2a3029fde884808ff1b369677abc7dd9a79a upstream.
When irq_desc is moved, we need to make sure to use the right cfg_new.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B07A739.3030104@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d3ad9373b7c29b63d5e8460a69453718d200cc3b upstream.
Deassigning a device from the passthrough domain does not
work and breaks device assignment to kvm guests. This patch
fixes the issue.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit f5325094379158e6b876ea0010c807bf7890ec8f upstream
This patch moves the initialization of the iommu-api out of
the dma-ops initialization code. This ensures that the
iommu-api is initialized even with iommu=pt.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit ee73f656a604d5aa9df86a97102e4e462dd79924 upstream.
PIT control word (address 0x43) is write-only, reads are undefined.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 923de3cf5bf12049628019010e36623fca5ef6d1 upstream.
Current kvm wallclock does not consider the total_sleep_time which could cause
wrong wallclock in guest after host suspend/resume. This patch solve
this issue by counting total_sleep_time to get the correct host boot time.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 557a701c16553b0b691dbb64ef30361115a80f64 upstream.
Easy fix for a regression introduced in 2.6.31.
On managed CPUs the cpufreq.c core will call driver->exit(cpu) on the
managed cpus and powernow_k8 will free the core's data.
Later driver->get(cpu) function might get called trying to read out the
current freq of a managed cpu and the NULL pointer check does not work on
the freed object -> better set it to NULL.
->get() is unsigned and must return 0 as invalid frequency.
Reference:
http://bugzilla.kernel.org/show_bug.cgi?id=14391
Signed-off-by: Thomas Renninger <trenn@suse.de>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d8cc108f4fab42b380c6b3f3356f99e8dd5372e2 upstream.
With multiplexing enabled oprofile crashs when profiling more than 28
events. This patch fixes this.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit e83e452b0692c9c13372540deb88a77d4ae2553d upstream.
Add Xeon 7500 series support to oprofile.
Straight forward: it's the same as Core i7, so just detect
the model number. No user space changes needed.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
(cherry picked from afbcf7ab8d1bc8c2d04792f6d9e786e0adeb328d)
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.
Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.
This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.
[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit d91afd15b041f27d34859c79afa9e172018a86f4 upstream.
The variable i in this function could be increased to over
2**32 which would result in an integer overflow when using
int. Fix it by changing i to unsigned long.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 7c099ce1575126395f186ecf58b51a60d5c3be7d upstream.
Commit 6aa542a694dc9ea4344a8a590d2628c33d1b9431 added a quirk for the
Intel DG45ID board due to low memory corruption. The Intel DG45FC
shares the same BIOS (and the same bug) as noted in:
http://bugzilla.kernel.org/show_bug.cgi?id=13736
Signed-off-by: David Härdeman <david@hardeman.nu>
LKML-Reference: <20100128200254.GA9134@hardeman.nu>
Cc: Alexey Fisher <bug-track@fisher-privat.net>
Cc: ykzhao <yakui.zhao@intel.com>
Cc: Tony Bones <aabonesml@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 35ea63d70f827a26c150993b4b940925bb02b03f upstream.
Dell OptiPlex 760 hangs on reboot unless reboot=bios is used. Add quirk
to reboot through the BIOS.
BugLink: https://bugs.launchpad.net/bugs/488319
Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
LKML-Reference: <1264634958.27335.1091.camel@emiko>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 73472a46b5b28116b145fb5fc05242c1aa8e1461 upstream
HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
side-effects on floppy DMA. Do not use HPET MSI on such platforms.
Original problem report from Mark Hounschell
http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html
Tested-by: Mark Hounschell <markh@compro.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
LKML-Reference: <20100121190952.GA32523@linux-os.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
commit 05d43ed8a89c159ff641d472f970e3f1baa66318 upstream.
Now that the previous commit made it possible to do the personality
setting at the point of no return, we do just that for ELF binaries.
And suddenly all the reasons for that insane TIF_ABI_PENDING bit go
away, and we can just make SET_PERSONALITY() just do the obvious thing
for a 32-bit compat process.
Everything becomes much more straightforward this way.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 221af7f87b97431e3ee21ce4b0e77d5411cf1549 upstream.
'flush_old_exec()' is the point of no return when doing an execve(), and
it is pretty badly misnamed. It doesn't just flush the old executable
environment, it also starts up the new one.
Which is very inconvenient for things like setting up the new
personality, because we want the new personality to affect the starting
of the new environment, but at the same time we do _not_ want the new
personality to take effect if flushing the old one fails.
As a result, the x86-64 '32-bit' personality is actually done using this
insane "I'm going to change the ABI, but I haven't done it yet" bit
(TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the
personality, but just the "pending" bit, so that "flush_thread()" can do
the actual personality magic.
This patch in no way changes any of that insanity, but it does split the
'flush_old_exec()' function up into a preparatory part that can fail
(still called flush_old_exec()), and a new part that will actually set
up the new exec environment (setup_new_exec()). All callers are changed
to trivially comply with the new world order.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b160091802d4a76dd063facb09fcf10bf5d5d747 upstream.
CONFIG_X86_CPU_DEBUG, which provides some parsed versions of the x86
CPU configuration via debugfs, has caused boot failures on real
hardware. The value of this feature has been marginal at best, as all
this information is already available to userspace via generic
interfaces.
Causes crashes that have not been fixed + minimal utility -> remove.
See the referenced LKML thread for more information.
Reported-by: Ozan Çağlayan <ozan@pardus.org.tr>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <alpine.LFD.2.00.1001221755320.13231@localhost.localdomain>
Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 3a5fc0e40cb467e692737bc798bc99773c81e1e2 upstream.
nodes_possible_map does not currently include nodes that have SRAT
entries that are all ACPI_SRAT_MEM_HOT_PLUGGABLE since the bit is
cleared in nodes_parsed if it does not have an online address range.
Unequivocally setting the bit in nodes_parsed is insufficient since
existing code, such as acpi_get_nodes(), assumes all nodes in the map
have online address ranges. In fact, all code using nodes_parsed
assumes such nodes represent an address range of online memory.
nodes_possible_map is created by unioning nodes_parsed and
cpu_nodes_parsed; the former represents nodes with online memory and
the latter represents memoryless nodes. We now set the bit for
hotpluggable nodes in cpu_nodes_parsed so that it also gets set in
nodes_possible_map.
[ hpa: Haicheng Li points out that this makes the naming of the
variable cpu_nodes_parsed somewhat counterintuitive. However, leave
it as is in the interest of keeping the pure bug fix patch small. ]
Signed-off-by: David Rientjes <rientjes@google.com>
Tested-by: Haicheng Li <haicheng.li@linux.intel.com>
LKML-Reference: <alpine.DEB.2.00.1001201152040.30528@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
drivers.
commit da482474b8396e1a099c37ffc6541b78775aedb4 upstream.
Pass the number of minors when unregistering MSR and CPUID drivers.
Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Dean Nelson <dnelson@redhat.com>
LKML-Reference: <20100127023722.GA22305@sgi.com>
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 6c56ccecf05fafe100ab4ea94f6fccbf5ff00db7 upstream.
Commit 83ce4009 did the following change
If the TSC is constant and non-stop, also set it reliable.
But, there seems to be few systems that will end up with TSC warp across
sockets, depending on how the cpus come out of reset. Skipping TSC sync
test on such systems may result in time inconsistency later.
So, reenable TSC sync test even on constant and non-stop TSC systems.
Set, sched_clock_stable to 1 by default and reset it in
mark_tsc_unstable, if TSC sync fails.
This change still gives perf benefit mentioned in 83ce4009 for systems
where TSC is reliable.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091217202702.GA18015@linux-os.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 443c39bc9ef7d8f648408d74c97e943f3bb3f48a upstream.
In function kvm_arch_vcpu_init(), if the memory malloc for
vcpu->arch.mce_banks is fail, it does not free the memory
of lapic date. This patch fixed it.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 36cb93fd6b6bf7e9163a69a8bf20207aed5fea44 upstream.
vcpu->arch.mce_banks is malloc in kvm_arch_vcpu_init(), but
never free in any place, this may cause memory leak. So this
patch fixed to free it in kvm_arch_vcpu_uninit().
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 82b7005f0e72d8d1a8226e4c192cbb0850d10b3f upstream.
When found a error hva, should not return PAGE_SIZE but the level...
Also clean up the coding style of the following loop.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit a6085fbaf65ab09bfb5ec8d902d6d21680fe1895 upstream.
Exit the guest pagetable walk loop if reading gpte failed. Otherwise its
possible to enter an endless loop processing the previous present pte.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit a5d36f82c4f3e852b61fdf1fee13463c8aa91b90 upstream.
When we queue an interrupt to the local apic, we set the IRR before the TMR.
The vcpu can pick up the IRR and inject the interrupt before setting the TMR,
and perhaps even EOI it, causing incorrect behaviour.
The race is really insignificant since it can only occur on the first
interrupt (usually following interrupts will not change TMR), but it's better
closed than open.
Fixed by reordering setting the TMR vs IRR.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 2992e545ea006992ec9dc91c4fa996ce1e15f921 upstream.
Thomas Schlichter reported:
> X.org uses libpciaccess which tries to mmap with write combining enabled via
> /sys/bus/pci/devices/*/resource0_wc. Currently, when PAT is not enabled, the
> kernel does fall back to uncached mmap. Then libpciaccess thinks it succeeded
> mapping with write combining enabled and does not set up suited MTRR entries.
> ;-(
Instead of silently mapping pci mmap region as UC minus in the case
of !pat_enabled and wc request, we can return error. Eric Anholt mentioned
that caller (like X) typically follows up with UC minus pci mmap request and
if there is a free mtrr slot, caller will manage adding WC mtrr.
Jesse Barnes says:
> Older versions of libpciaccess will behave better if we do it that way
> (iirc it only allocates an MTRR if the resource_wc file doesn't exist or
> fails to get mapped).
Reported-by: Thomas Schlichter <thomas.schlichter@web.de>
Signed-off-by: Thomas Schlichter <thomas.schlichter@web.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 0b962d473af32ec334df271b54ff4973cb2b4c73 upstream.
register_chrdev() hardcodes registering 256 minors, presumably to
avoid breaking old drivers. However, we need to register enough
minors so that we have all possible CPUs.
checkpatch warns on this patch, but the patch is correct: NR_CPUS here
is a static *upper bound* on the *maximum CPU index* (not *number of
CPUs!*) and that is what we want.
Reported-and-tested-by: Russ Anderson <rja@sgi.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <tip-*@git.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit fcfbb2b5facd65efa7284cc315225bfe3d1856c2 upstream.
This fixes the problem of the initialization code not correctly
mapping the entire MMIO space on a UV system. A side effect is
the map_high() interface needed to be changed to accommodate
different address and size shifts.
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Mike Habeck <habeck@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4B479202.7080705@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit dfea91d5a7c795fd6f4e1a97489a98e4e767463e upstream.
Chris McDermott from IBM confirmed that hurricane chipset in IBM summit
platforms doesn't support logical flat mode. Irrespective of the other
things like apic_id's, total number of logical cpu's, Linux kernel
should default to physical mode for this system.
The 32-bit kernel does so using the OEM checks for the IBM summit
platform. Add a similar OEM platform check for the 64bit kernel too.
Otherwise the linux kernel boot can hang on this platform under certain
bios/platform settings.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Tested-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Chris McDermott <lcm@linux.vnet.ibm.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 485a2e1973fd9f98c2c6776e66ac4721882b69e0 upstream.
Add check if APIC is not disabled since thermal
monitoring depends on it. As only apic gets disabled
we should not try to install "thermal monitor" vector,
print out that thermal monitoring is enabled and etc...
Note that "Intel Correct Machine Check Interrupts" already
has such a check.
Also I decided to not add cpu_has_apic check into
mcheck_intel_therm_init since even if it'll call apic_read on
disabled apic -- it's safe here and allow us to save a few code
bytes.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
LKML-Reference: <4B25FDC2.3020401@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
pre-Pentium"
commit db677ffa5f5a4f15b9dad4d132b3477b80766d82 upstream.
This reverts commit ae1b22f6e46c03cede7cea234d0bf2253b4261cf.
As Linus said in 982d007a6ee: "There was something really messy about
cmpxchg8b and clone CPU's, so if you enable it on other CPUs later, do it
carefully."
This breaks lguest for those configs, but we can fix that by emulating
if we have to.
Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14884
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
This backports the following upstream commits all as one patch:
54f5de709984bae0d31d823ff03de755f9dcac54
ecc1a8993751de4e82eb18640d631dae1f626bd6
1a0ef85f84feb13f07b604fcf5b90ef7c2b5c82f
f106af4e90eadd76cfc0b5325f659619e08fb762
097eed103862f9c6a97f2e415e21d1134017b135
935874141df839c706cd6cdc438e85eb69d1525e
0ec62d290912bb4b989be7563851bc364ec73b56
c4caa778157dbbf04116f0ac2111e389b5cd7a29
2ea1d13f64efdf49319e86c87d9ba38c30902782
570dcf2c15463842e384eb597a87c1e39bead99b
564b3bffc619dcbdd160de597b0547a7017ea010
0067bd8a55862ac9dd212bd1c4f6f5bff1ca1301
f8b7256096a20436f6d0926747e3ac3d64c81d24
8c7b49b3ecd48923eb64ff57e07a1cdb74782970
9206de95b1ea68357996ec02be5db0638a0de2c1
2c6a10161d0b5fc047b5bd81b03693b9af99fab5
05d72faa6d13c9d857478a5d35c85db9adada685
bb52d6694002b9d632bb355f64daa045c6293a4e
e77414e0aad6a1b063ba5e5750c582c75327ea6a
aa65607373a4daf2010e8c3867b6317619f3c1a3
Backport done by Greg Kroah-Hartman. Only minor tweaks were needed.
Cc: David S. Miller <davem@davemloft.net>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 6ede31e03084ee084bcee073ef3d1136f68d0906 upstream.
Randy Dunlap reported the following build error:
"When CONFIG_SMP=n, CONFIG_X86_MSR=m:
ERROR: "msrs_free" [drivers/edac/amd64_edac_mod.ko] undefined!
ERROR: "msrs_alloc" [drivers/edac/amd64_edac_mod.ko] undefined!"
This is due to the fact that <arch/x86/lib/msr.c> is conditioned on
CONFIG_SMP and in the UP case we have only the stubs in the header.
Fork off SMP functionality into a new file (msr-smp.c) and build
msrs_{alloc,free} unconditionally.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
LKML-Reference: <20091216231625.GD27228@liondog.tnic>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 505422517d3f126bb939439e9d15dece94e11d2c upstream.
The current rd/wrmsr_on_cpus helpers assume that the supplied
cpumasks are contiguous. However, there are machines out there
like some K8 multinode Opterons which have a non-contiguous core
enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see
http://www.gossamer-threads.com/lists/linux/kernel/1160268.
This patch fixes out-of-bounds writes (see URL above) by adding per-CPU
msr structs which are used on the respective cores.
Additionally, two helpers, msrs_{alloc,free}, are provided for use by
the callers of the MSR accessors.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091211171440.GD31998@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit b8a4754147d61f5359a765a3afd3eb03012aa052 upstream.
Since rdmsr_on_cpus and wrmsr_on_cpus are almost identical, unify them
into a common __rwmsr_on_cpus helper thus avoiding code duplication.
While at it, convert cpumask_t's to const struct cpumask *.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 04a1e62c2cec820501f93526ad1e46073b802dc4 upstream.
The loop condition is fragile: we compare an unsigned value to zero, and
then decrement it by something larger than one in the loop. All the
callers should be passing in appropriately aligned buffer lengths, but
it's better to just not rely on it, and have some appropriate defensive
loop limits.
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|