Age | Commit message (Collapse) | Author |
|
commit 64aab720bdf8771214a7c88872bd8e3194c2d279 upstream.
Array of udimm sysfs attributes was not ended with NULL marker, leading to
dereference of random memory.
EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm0
EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm1
EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm2
BUG: unable to handle kernel NULL pointer dereference at 00000000000001a4
IP: [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
Pid: 1, comm: swapper Not tainted 2.6.36-rc3-nv+ #483 P6T SE/System Product Name
RIP: 0010:[<ffffffff81330b36>] [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
(...)
Call Trace:
[<ffffffff81330b86>] edac_create_mci_instance_attributes+0x198/0x1f1
[<ffffffff81330c9a>] edac_create_sysfs_mci_device+0xbb/0x2b2
[<ffffffff8132f533>] edac_mc_add_mc+0x46b/0x557
[<ffffffff81428901>] i7core_probe+0xccf/0xec0
RIP [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
---[ end trace 20de320855b81d78 ]---
Kernel panic - not syncing: Attempted to kill init!
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Don't print failure to detect Core i7 EDAC facilities to the console at
boot time, most often occurring on Core i7 desktops and laptops.
Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As Nehalem/Nehalem-EP/Westmere devices uses several devices for the same
functionality (memory controller), the default way of proping devices doesn't
work. So, instead of a per-device probe, all devices should be probed at once.
This means that we should block any new attempt of probe, otherwise, it will
try to register the same device several times.
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
On Nehalem/Nehalem-EP/Westmere, the first QPI device is the last PCI bus.
The last bus is generally at 0x3f or 0xff, but there are also other systems
using different setups. For example, HP Z800 has 0x7f as the last bus.
This patch adds a logic to discover the last bus, dynamically detecting it
at runtime.
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
This adds new PCI IDs for the Westmere's memory controller
devices and modifies the i7core_edac driver to be able to
probe both Nehalem and Westmere processors.
Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
As reported by Vernon Mauery <vernux@us.ibm.com>, X5670 (Westmere-EP) uses a
different register for one of the uncore PCI devices. Add support for
it.
Those are the PCI ID's on this new chipset:
fe:00.0 0600: 8086:2c70 (rev 02)
fe:00.1 0600: 8086:2d81 (rev 02)
fe:02.0 0600: 8086:2d90 (rev 02)
fe:02.1 0600: 8086:2d91 (rev 02)
fe:02.2 0600: 8086:2d92 (rev 02)
fe:02.3 0600: 8086:2d93 (rev 02)
fe:02.4 0600: 8086:2d94 (rev 02)
fe:02.5 0600: 8086:2d95 (rev 02)
fe:03.0 0600: 8086:2d98 (rev 02)
fe:03.1 0600: 8086:2d99 (rev 02)
fe:03.2 0600: 8086:2d9a (rev 02)
fe:03.4 0600: 8086:2d9c (rev 02)
fe:04.0 0600: 8086:2da0 (rev 02)
fe:04.1 0600: 8086:2da1 (rev 02)
fe:04.2 0600: 8086:2da2 (rev 02)
fe:04.3 0600: 8086:2da3 (rev 02)
fe:05.0 0600: 8086:2da8 (rev 02)
fe:05.1 0600: 8086:2da9 (rev 02)
fe:05.2 0600: 8086:2daa (rev 02)
fe:05.3 0600: 8086:2dab (rev 02)
fe:06.0 0600: 8086:2db0 (rev 02)
fe:06.1 0600: 8086:2db1 (rev 02)
fe:06.2 0600: 8086:2db2 (rev 02)
fe:06.3 0600: 8086:2db3 (rev 02)
(as usual, the same PCI devices repeat at ff: bus)
The PCI device 8086:2c70 is shown as:
fe:00.0 Host bridge: Intel Corporation QuickPath Architecture Generic
Non-core Registers (rev 02)
So, for this device to be recognized, it is only a matter of adding this
new PCI ID to the driver.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
This fixes an error in function i7core_check_error
In commit ca9c90ba09ca3c9799319f46a56f397afbf617c2 which converts the
driver to use double buffering, there is a change in the logic. Before,
if mce_count was zero, it skipped over a couple of statements and
finished out with a call to the *check_mc_ecc_err function. The current
code checks to see if mce_count is 0 and then exits.
This change reverts the behavior back to the original where if there are
no errors to report, we skip to the end and call the *check_mc_ecc_err
function.
This fix allows the driver to work again on my Nehalem based blades
again.
Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Free already allocated i7core_dev.
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
It's called only from an __init function and is the only user
of pcibios_scan_specific_bus which will be marked as __devinit in
the next patch.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Fix build warning (missing header file) and
build error when CONFIG_SMP=n.
drivers/edac/i7core_edac.c:860: error: implicit declaration of function 'msleep'
drivers/edac/i7core_edac.c:1700: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Fix the shifts up
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Currently, only one PCI set of tables is allowed. This prevents using
the driver for other devices like Lynnfield, with have a different
set of PCI ID's.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Fix ringbuffer store logic.
While here, add a few comments to the code and remove the undesired
printk that could otherwise be called during NMI time.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Instead of accepting just "any", accept also "any\n"
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Instead of displaying 3 values at the same var, break it into 3
different sysfs nodes:
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2
For registered dimms, however, the error counters are already being
displayed at:
/sys/devices/system/edac/mc/mc0/csrow*/ce_count
So, there's no need to add any extra sysfs nodes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
The old remove module stragegy didn't work on devices with multiple
cores, since only one PCI device is used to open all mc's, due to
Nehalem nature.
Also, it were based at pdev value. However, this doesn't point to the
pci device used at mci->dev.
So, instead, it unregisters all devices at once, deleting them from the
device list.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
The number of sockets is now fully dynamic. Get rid of this obsolete
var.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
In thesis, the other mc controller should handle it.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Instead of creating just one memory controller, create one per socket
(e. g. per Quick Link Path Interconnect).
This better reflects the Nehalem architecture.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Instead of using a static table assuming always 2 CPU sockets, allocate
space dynamically for Nehalem PCI devs.
This patch is part of a series of patches that changes i7core_edac to
allow more than 2 sockets and to properly report one memory controller
per socket.
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Just cosmetics. instead of showing something like:
socket 0, channel 2dimm0: 1
dimm1: 0
dimm2: 0
socket 1, channel 2dimm0: 0
dimm1: 0
dimm2: 0
Show:
socket 0, channel 2 RDIMM0: 1 RDIMM1: 0 RDIMM2: 0
socket 0, channel 2 RDIMM0: 0 RDIMM1: 0 RDIMM2: 0
This is more synthetic and easier to parse.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
On the Xeon 55XX series cpus the pci deives are not exposed via acpi so
we much explicitly probe them to make the usable as a Linux PCI device.
This moves the detection of this state to before pci_register_driver is
called. Its present position was not working on my systems, the driver
would complain about not finding a specific device.
This patch allows the driver to load on my systems.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Instead of assuming that the entire machine has either registered or
unregistered memories, do it at CPU socket based.
While here, fix a bug at i7core_mce_output_error(), where the we're
using m->cpu directly as if it would represent a socket. Instead, the
proper socket_id is given by cpu_data[m->cpu].phys_proc_id.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
|
|
Nehalem and upper chipsets provide an special device that has corrected memory
error counters detected with registered dimms. This device is only seen if
there are registered memories plugged.
After this patch, on a machine fully equiped with RDIMM's, it will use the
Device 3 function 2 to count corrected errors instead on relying at mcelog.
For unregistered DIMMs, it will keep the old behavior, counting errors
via mcelog.
This patch were developed together with Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
From: Keith Mannthey <kmannth@us.ibm.com>
Simple correction to a shift value.
ECC_ENABLED is bit 4 of MC_STATUS, Dev 3 Fun 0 Offset 0x4c
This correctly identifies the state of the ECC at the machine.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
There were two stupid error injection bugs introduced by wrong
cut-and-paste: one at socket store, and another at the error inject
register. The last one were causing the code to not work at all.
While here, adds debug messages to allow seeing what registers are being
set while sending error injection.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|