Age | Commit message (Collapse) | Author |
|
commit e726f3c368e7c1919a7166ec09c5705759f1a69d upstream.
When matching error address to the range contained by one memory node,
we're in valid range when node interleaving
1. is disabled, or
2. enabled and when the address bits we interleave on match the
interleave selector on this node (see the "Node Interleaving" section in
the BKDG for an enlightening example).
Thus, when we early-exit, we need to reverse the compound logic
statement properly.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 1cd8521e7d77def75fdb1cb35ecd135385e4be4f upstream.
Since commit 5753c082f66eca5be81f6bda85c1718c5eea6ada ("powerpc/85xx:
Kconfig cleanup"), there is no MPC85xx Kconfig symbol anymore, so the
driver became non-selectable.
This patch fixes the issue by switching to PPC_85xx symbol.
Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Dave Jiang <djiang@mvista.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 41c310447fe06bcedc22b75752c18b60e0b9521b upstream.
When calculating the DCT channel from the syndrome we need to know the
syndrome type (x4 vs x8). On F10h, this is read out from extended PCI
cfg space register F3x180 while on K8 we only support x4 syndromes and
don't have extended PCI config space anyway.
Make the code accessing F3x180 F10h only and fall back to x4 syndromes
on everything else.
Reported-by: Jeffrey Merkey <jeffmerkey@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 35d824b28fc5544d1eb7c1e3db15a1740df8ec4b upstream.
Correct two mishaps which prevented reporting error type (CECC vs UECC)
and extended error description.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
commit 5b89d2f9ace1970324facc68ca9b8fae19ce8096 upstream.
Print the CPU associated with the error only when the field is valid.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
amd64_edac: Do not falsely trigger kerneloops
|
|
Some unused, unsupported debug code existed in the mpc85xx EDAC driver
that resulted in a build failure when CONFIG_EDAC_DEBUG was defined:
drivers/edac/mpc85xx_edac.c: In function 'mpc85xx_mc_err_probe':
drivers/edac/mpc85xx_edac.c:1031: error: implicit declaration of function 'edac_mc_register_mcidev_debug'
drivers/edac/mpc85xx_edac.c:1031: error: 'debug_attr' undeclared (first use in this function)
drivers/edac/mpc85xx_edac.c:1031: error: (Each undeclared identifier is reported only once
drivers/edac/mpc85xx_edac.c:1031: error: for each function it appears in.)
Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit b4846251727a38a7f248e41308c060995371dd05 ("edac: mpc85xx add
mpc83xx support") accidentally broke how a chip select's first and last
page addresses are calculated. The page addresses are being shifted too
far right by PAGE_SHIFT. This results in errors such as:
EDAC MPC85xx MC1: Err addr: 0x003075c0
EDAC MPC85xx MC1: PFN: 0x00000307
EDAC MPC85xx MC1: PFN out of range!
EDAC MC1: INTERNAL ERROR: row out of range (4 >= 4)
EDAC MC1: CE - no information available: INTERNAL ERROR
The vaule of PAGE_SHIFT is already being taken into consideration during
the calculation of the 'start' and 'end' variables, thus it is not
necessary to account for it again when setting a chip select's first and
last page address.
Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Ira W. Snyder <iws@ovro.caltech.edu>
Cc: Kumar Gala <galak@gate.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
An unfortunate "WARNING" in the message amd64_edac dumps when the system
doesn't support DRAM ECC or ECC checking is not enabled in the BIOS
used to trigger kerneloops which qualified the message as an OOPS thus
misleading the users. See, e.g.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/422536
http://bugzilla.kernel.org/show_bug.cgi?id=15238
Downgrade the message level to KERN_NOTICE and fix the formulation.
Cc: stable@kernel.org # .32.x
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
|
|
EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)
Kernel panic - not syncing: EDAC MC0: Uncorrected Error (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
This happens because FERR_NF_FBD bit 28 is not updated on i5000. Due to
that, both bits 28 and 29 may be equal to one, returning channel = 3. As
this value is invalid, EDAC core generates the panic.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14568
Signed-off-by: Tamas Vincze <tom@vincze.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add a missing iterator variable thus fixing the conditional of the
for-loop in amd64_get_scrub_rate().
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Do not spam the logs needlessly with the sole info that
edac_pci_dev_parity_clear is being called.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Do not access F2x19[0,4] on K8 since they're undefined there.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Clear the override flag after force-loading the module.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Currently, the module does not initialize fully when the DIMMs aren't
ECC but remains still loaded. Propagate the error when no instance of
the driver is properly initialized and prevent further loading.
Reorganize and polish error handling in amd64_edac_init() while at it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Fix use-after-free errors by pushing all memory-freeing calls to the end
of amd64_remove_one_instance().
Reported-by: Darren Jenkins <darrenrjenkins@gmail.com>
LKML-Reference: <1261370306.11354.52.camel@ICE-BOX>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Fix the case when amd64_debug_display_dimm_sizes() reports only half the
amount of DRAM on it because it doesn't account for when the single DCT
operates in 128-bit mode and merges chip selects from different DIMMs.
Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
LKML-Reference: <200912112202.48173.johannes.hirte@fem.tu-ilmenau.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
edac, mce, amd: silence GART TLB errors
edac, mce: correct corenum reporting
|
|
Although reporting of benign GART TLB errors is disabled in
__mcheck_cpu_apply_quirks, those are still being logged, and, as a
result, trip up amd64_edac. Pull up reporting check so that machines
with loaded edac module bail out early and don't spit fragments into
dmesg.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Add support for 6 ranks per channel to the i5100 chipset. I have tested
the patch as far as possible with correctible errors and things appear
good. The DIMM mapping is correct for our board, but boards may differ.
Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Acked-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Addscrubbing to the i5100 chipset. The i5100 chipset only supports one
scrubbing rate, which is not constant but dependent on memory load. The
rate returned by this driver is an estimate based on some experimentation,
but is substantially closer to the truth than the speed supplied in the
documentation.
Also, scrubbing is done once, and then a done-bit is set. This means that
to accomplish continuous scrubbing a re-enabling mechanism must be used.
I have created the simplest possible such mechanism in the form of a
work-queue which will check every five minutes. This interval is quite
arbitrary but should be sufficient for all sizes of system memory.
Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The i5100 driver uses the word controller instead of channel in a lot of
places, this is simply a cleanup of the patch.
Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix core number reporting with NB MCEs.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
The current rd/wrmsr_on_cpus helpers assume that the supplied
cpumasks are contiguous. However, there are machines out there
like some K8 multinode Opterons which have a non-contiguous core
enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see
http://www.gossamer-threads.com/lists/linux/kernel/1160268.
This patch fixes out-of-bounds writes (see URL above) by adding per-CPU
msr structs which are used on the respective cores.
Additionally, two helpers, msrs_{alloc,free}, are provided for use by
the callers of the MSR accessors.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091211171440.GD31998@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
This was long overdue ...
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
drivers/edac/amd64_edac.c: In function 'amd64_edac_init':
drivers/edac/amd64_edac.c:2840: warning: 'ret' may be used uninitialized in this function
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
The routine does the reverse mapping of the error address of a CECC back
to the node id, DRAM controller and chip select of the DIMM which caused
the error. We should lookup the channel using the syndromes _only_ when
the DCTs are ganged so fix that.
Also, add an early exit when there's an error while scanning for the
csrow thus decreasing indentation levels for better readability.
Finally, fixup comments.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Instead of using the whole syndrome tables for channel decoding, use a
set of eigenvectors with which the tables can be generated to search for
the syndrome in error. The algorithm operates independently of symbol
size and can be used for both x4 and x8 syndromes.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
The .probe_valid_hardware low_ops member checked whether the DCTs are in
DDR3 mode and bailed out if so. Now that all the needed changes for DDR3
support is in place, remove it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Instead of using deeply-nested conditionals for dumping the DIMM type in
debug mode, add a strings array of the supported DIMM types.
This is useful in cases where an edac driver supports multiple DRAM
types and is only defined in debug builds.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
F10h revD start with model number 8.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
SystemAddress -> sys_addr
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Add cs mode to cs size mapping tables for DDR2 and DDR3 and F10
and all K8 flavors and remove klugdy table of pseudo values. Add a
low_ops->dbam_to_cs member which is family-specific and replaces
low_ops->dbam_map_to_pages since the pages calculation is a one liner
now.
Further cleanups, while at it:
- shorten family name defines
- align amd64_family_types struct members
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Do not read DCLR[01] again since this is done in
amd64_read_mc_registers() earlier. There can be more than two physical
DIMMs present so clamp the channels value to max 2. Also, do not report
DCT data width - it is also done earlier.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Extend f10_debug_display_dimm_sizes to dump the logical DIMMs
configuration on K8 revF too. Remove the ganged arg since we print the
DCT operating mode (ganged vs unganged) earlier.
Also, DCT csrow configuration is relevant therefore dump it as
KERN_DEBUG instead of only on debug builds. Remove misleading DIMM
output since there's no reliable way of mapping of chip selects to
actual physical DIMMs.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Clarify bitfields description, add PCI config function/offset names to
registers for easy reference, simplify code layout, remove unneeded
info.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Carve out the register-specific debug statements into a separate
function, clarify meanings of the single bitfields in the register,
remove irrelevant output and macros.
There should be no functionality change resulting from this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Add a pci config read wrapper for signaling pci config space access
errors instead of them being visible only on a debug build. This is
important on amd64_edac since it uses all those pci config register
values to access the DRAM/DIMM configuration of the nodes.
In addition, the wrapper makes a _lot_ (look at the diffstat!) of
error handling code superfluous and improves much of the overall code
readability by removing error handling details out of the way.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Unify almost identical code into one function and remove NUMA-specific
usage (specifically cpumask_of_node()) in favor of generic topology
methods.
Remove unused defines, while at it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
cpumask_t -> struct cpumask, and don't put one on the stack. (Note: this
is actually on the stack unless CONFIG_CPUMASK_OFFSTACK=y).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Do not shift the TOP_MEM and TOP_MEM2 values by 23 but rather save the
whole 64-bit value read from the MSR. Although the TOP_MEM/TOP_MEM2 bits
are only a subset of the 64bit register, the values are correct since
the remaining bits are Read-As-Zero and no shifting is needed.
Also, cleanup DRAM base/limit debug output.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Make debug info formulations about the DRAM and DCT configuration of the
machine more human readable.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Merge reason: It's ready for v2.6.33.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Shift error type bits properly.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
In amd64_edac_init(void) in amd64_edac.c, cache_k8_northbridges() is
called before pci_register_driver. If it fails, should exit with err
directly.
Signed-off-by: Li Hong <lihong.hi@gmail.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Allow csrows to properly initialize when the topology only has active
channels on 2 and 3. This new check allows proper detection and
initialization in this topology. Only checking the first mrt that
represented channels 0 and 1 is not sufficient.
I also fixed up the related debug information path. I can submit as a 2nd
patch if needed.
Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Acked-by: Aristeu Rozanski <aris@ruivo.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When building without CONFIG_PCI the edac_pci_idx variable is unused,
causing a build-time warning. Wrap the variable in #ifdef CONFIG_PCI,
just like the rest of the PCI support.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|