<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/edac, branch v3.9</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/drivers/edac?h=v3.9</id>
<link rel='self' href='https://git.amat.us/linux/atom/drivers/edac?h=v3.9'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2013-03-16T05:32:30Z</updated>
<entry>
<title>EDAC: Merge mci.mem_is_per_rank with mci.csbased</title>
<updated>2013-03-16T05:32:30Z</updated>
<author>
<name>Mauro Carvalho Chehab</name>
<email>mchehab@redhat.com</email>
</author>
<published>2013-03-11T12:28:48Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9713faecff3d071de1208b081d4943b002e9cb1c'/>
<id>urn:sha1:9713faecff3d071de1208b081d4943b002e9cb1c</id>
<content type='text'>
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.

There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:

	if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
			per_rank = true;

Signed-off-by: Mauro Carvalho Chehab &lt;mchehab@redhat.com&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
</content>
</entry>
<entry>
<title>amd64_edac: Correct DIMM sizes</title>
<updated>2013-03-16T05:32:02Z</updated>
<author>
<name>Mauro Carvalho Chehab</name>
<email>mchehab@redhat.com</email>
</author>
<published>2013-03-11T12:07:46Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=1eef1282549d7accdd33ee36d409b039b1f911fb'/>
<id>urn:sha1:1eef1282549d7accdd33ee36d409b039b1f911fb</id>
<content type='text'>
We were filling the csrow size with a wrong value. 16a528ee3975 ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.

Signed-off-by: Mauro Carvalho Chehab &lt;mchehab@redhat.com&gt;
[ make it a per-csrow accounting regardless of -&gt;channel_count ]
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
</content>
</entry>
<entry>
<title>EDAC: Make sysfs functions static</title>
<updated>2013-03-05T10:33:57Z</updated>
<author>
<name>Stephen Hemminger</name>
<email>stephen@networkplumber.org</email>
</author>
<published>2013-02-22T05:44:17Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=fbe2d3616cee37418d832b30130811888c5aaf34'/>
<id>urn:sha1:fbe2d3616cee37418d832b30130811888c5aaf34</id>
<content type='text'>
Fixes lots of sparse warnings here.

Signed-off-by: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Signed-off-by: Borislav Petkov &lt;bp@suse.de&gt;
</content>
</entry>
<entry>
<title>Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac</title>
<updated>2013-03-01T04:42:33Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-03-01T04:42:33Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=ad6c2c2eb34f234d6253292b9b3c047614fbfe7e'/>
<id>urn:sha1:ad6c2c2eb34f234d6253292b9b3c047614fbfe7e</id>
<content type='text'>
Pull EDAC fixes and ghes-edac from Mauro Carvalho Chehab:
 "For:

   - Some fixes at edac drivers (i7core_edac, sb_edac, i3200_edac);
   - error injection support for i5100, when EDAC debug is enabled;
   - fix edac when it is loaded builtin (early init for the subsystem);
   - a "Firmware First" EDAC driver, allowing ghes to report errors via
     EDAC (ghes-edac).

  With regards to ghes-edac, this fixes a longstanding BZ at Red Hat
  that happens with Nehalem and Sandy Bridge CPUs: when both GHES and
  i7core_edac or sb_edac are running, the error reports are
  unpredictable, as both BIOS and OS race to access the registers.  With
  ghes-edac, the EDAC core will refuse to register any other concurrent
  memory error driver.

  This patchset moves the ghes struct definitions to a separate header
  file (include/acpi/ghes.h) and adds 3 hooks at apei/ghes.c to
  register/unregister and to report errors via ghes-edac.  Those changes
  were acked by ghes driver maintainer (Huang)."

* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (30 commits)
  i5100_edac: convert to use simple_open()
  ghes_edac: fix to use list_for_each_entry_safe() when delete list items
  ghes_edac: Fix RAS tracing
  ghes_edac: Make it compliant with UEFI spec 2.3.1
  ghes_edac: Improve driver's printk messages
  ghes_edac: Don't credit the same memory dimm twice
  ghes_edac: do a better job of filling EDAC DIMM info
  ghes_edac: add support for reporting errors via EDAC
  ghes_edac: Register at EDAC core the BIOS report
  ghes: add the needed hooks for EDAC error report
  ghes: move structures/enum to a header file
  edac: add support for error type "Info"
  edac: add support for raw error reports
  edac: reduce stack pressure by using a pre-allocated buffer
  edac: lock module owner to avoid error report conflicts
  edac: remove proc_name from mci structure
  edac: add a new memory layer type
  edac: initialize the core earlier
  edac: better report error conditions in debug mode
  i5100_edac: Remove two checkpatch warnings
  ...
</content>
</entry>
<entry>
<title>i5100_edac: convert to use simple_open()</title>
<updated>2013-02-26T13:06:18Z</updated>
<author>
<name>Wei Yongjun</name>
<email>yongjun_wei@trendmicro.com.cn</email>
</author>
<published>2013-02-26T09:18:34Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b0769891ba7baa53f270dc70d71934748beb4c5b'/>
<id>urn:sha1:b0769891ba7baa53f270dc70d71934748beb4c5b</id>
<content type='text'>
This removes an open coded simple_open() function and
replaces file operations references to the function
with simple_open() instead.

Signed-off-by: Wei Yongjun &lt;yongjun_wei@trendmicro.com.cn&gt;
Signed-off-by: Mauro Carvalho Chehab &lt;mchehab@redhat.com&gt;
</content>
</entry>
<entry>
<title>ghes_edac: fix to use list_for_each_entry_safe() when delete list items</title>
<updated>2013-02-26T13:05:35Z</updated>
<author>
<name>Wei Yongjun</name>
<email>yongjun_wei@trendmicro.com.cn</email>
</author>
<published>2013-02-26T08:39:14Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5dae92a718570e6a942e0b882e53d25cab03b40f'/>
<id>urn:sha1:5dae92a718570e6a942e0b882e53d25cab03b40f</id>
<content type='text'>
Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each_entry() macro aptly named
list_for_each_entry_safe().

Signed-off-by: Wei Yongjun &lt;yongjun_wei@trendmicro.com.cn&gt;
Signed-off-by: Mauro Carvalho Chehab &lt;mchehab@redhat.com&gt;
</content>
</entry>
<entry>
<title>ghes_edac: Fix RAS tracing</title>
<updated>2013-02-25T22:42:17Z</updated>
<author>
<name>Mauro Carvalho Chehab</name>
<email>mchehab@redhat.com</email>
</author>
<published>2013-02-20T00:35:41Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=8ae8f50ad8979bb670598ff92eebea611b799f10'/>
<id>urn:sha1:8ae8f50ad8979bb670598ff92eebea611b799f10</id>
<content type='text'>
With the current version of CPER, there's no way to associate an
error with the memory error. So, the error location in EDAC
layers is unused.

As CPER has its own idea about memory architectural layers, just
output whatever is there inside the driver's detail at the RAS
tracepoint.

The EDAC location keeps untouched, in the case that, in some future,
we could actually map the error into the dimm labels.

Now, the error message:

[   72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[   72.396627] {1}[Hardware Error]: APEI generic hardware error status
[   72.396628] {1}[Hardware Error]: severity: 2, corrected
[   72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected
[   72.396632] {1}[Hardware Error]: flags: 0x01
[   72.396634] {1}[Hardware Error]: primary
[   72.396635] {1}[Hardware Error]: section_type: memory error
[   72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400
[   72.396638] {1}[Hardware Error]: node: 3
[   72.396639] {1}[Hardware Error]: card: 0
[   72.396640] {1}[Hardware Error]: module: 0
[   72.396641] {1}[Hardware Error]: device: 0
[   72.396643] {1}[Hardware Error]: error_type: 18, unknown
[   72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Is properly represented on the trace event:

     kworker/0:2-584   [000] ....    72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location:-1:-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 sockets E5-4650 Sandy Bridge machine.

Signed-off-by: Mauro Carvalho Chehab &lt;mchehab@redhat.com&gt;
</content>
</entry>
<entry>
<title>ghes_edac: Make it compliant with UEFI spec 2.3.1</title>
<updated>2013-02-25T22:42:16Z</updated>
<author>
<name>Mauro Carvalho Chehab</name>
<email>mchehab@redhat.com</email>
</author>
<published>2013-02-19T22:24:12Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=689c9cd8128f13bf9843a3e133423f5e3e0ce4aa'/>
<id>urn:sha1:689c9cd8128f13bf9843a3e133423f5e3e0ce4aa</id>
<content type='text'>
The UEFI spec defines the memory error types ans the bits that
validate each field on the memory error record, at
Appendix N om items N.2.5 (Memory Error Section) and
N.2.11 (Error Status). Make the error description compliant with
it, only showing the valid fields.

The EDAC error log is now properly reporting the error:

[  281.556854] mce: [Hardware Error]: Machine check events logged
[  281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[  281.557044] {2}[Hardware Error]: APEI generic hardware error status
[  281.557046] {2}[Hardware Error]: severity: 2, corrected
[  281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected
[  281.557050] {2}[Hardware Error]: flags: 0x01
[  281.557052] {2}[Hardware Error]: primary
[  281.557053] {2}[Hardware Error]: section_type: memory error
[  281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400
[  281.557056] {2}[Hardware Error]: node: 3
[  281.557057] {2}[Hardware Error]: card: 0
[  281.557058] {2}[Hardware Error]: module: 1
[  281.557059] {2}[Hardware Error]: device: 0
[  281.557061] {2}[Hardware Error]: error_type: 18, unknown
[  281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9
[  281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)

Tested on a 4 CPUs E5-4650 Sandy Bridge machine.

Signed-off-by: Mauro Carvalho Chehab &lt;mchehab@redhat.com&gt;
</content>
</entry>
<entry>
<title>ghes_edac: Improve driver's printk messages</title>
<updated>2013-02-25T22:42:15Z</updated>
<author>
<name>Mauro Carvalho Chehab</name>
<email>mchehab@redhat.com</email>
</author>
<published>2013-02-15T12:06:38Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d2a6856614fd34e36352146307a5655efbdbc14d'/>
<id>urn:sha1:d2a6856614fd34e36352146307a5655efbdbc14d</id>
<content type='text'>
Provide a better infrastructure for printk's inside the driver:
	- use edac_dbg() for debug messages;
	- standardize the usage of pr_info();
	- provide warning about the risk of relying on this
	  driver.

While here, changes the size of a fake memory to 1 page. This is
as good or as bad as 1000 pages, but it is easier for userspace to
detect, as I don't expect that any machine implementing GHES would
provide just 1 page available ;)

Signed-off-by: Mauro Carvalho Chehab &lt;mchehab@redhat.com&gt;

Conflicts:
	drivers/edac/ghes_edac.c
</content>
</entry>
<entry>
<title>ghes_edac: Don't credit the same memory dimm twice</title>
<updated>2013-02-25T22:42:14Z</updated>
<author>
<name>Mauro Carvalho Chehab</name>
<email>mchehab@redhat.com</email>
</author>
<published>2013-02-15T11:45:00Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=5ee726db521fddf991f261e5c45e04a7d2bf1bc1'/>
<id>urn:sha1:5ee726db521fddf991f261e5c45e04a7d2bf1bc1</id>
<content type='text'>
On my tests on a 4xE5-4650 CPU's system, the GHES
EDAC driver is called twice. As the SMBIOS DMI enumeration
call will seek for the entire DIMM sockets in the system, on
this machine, equipped with 128 GB of RAM, the memory is
displayed twice:

          +-----------------------+
          |    mc0    |    mc1    |
----------+-----------------------+
memory45: |  8192 MB  |  8192 MB  |
memory44: |     0 MB  |     0 MB  |
----------+-----------------------+
memory43: |     0 MB  |     0 MB  |
memory42: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory41: |     0 MB  |     0 MB  |
memory40: |     0 MB  |     0 MB  |
----------+-----------------------+
memory39: |  8192 MB  |  8192 MB  |
memory38: |     0 MB  |     0 MB  |
----------+-----------------------+
memory37: |     0 MB  |     0 MB  |
memory36: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory35: |     0 MB  |     0 MB  |
memory34: |     0 MB  |     0 MB  |
----------+-----------------------+
memory33: |  8192 MB  |  8192 MB  |
memory32: |     0 MB  |     0 MB  |
----------+-----------------------+
memory31: |     0 MB  |     0 MB  |
memory30: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory29: |     0 MB  |     0 MB  |
memory28: |     0 MB  |     0 MB  |
----------+-----------------------+
memory27: |  8192 MB  |  8192 MB  |
memory26: |     0 MB  |     0 MB  |
----------+-----------------------+
memory25: |     0 MB  |     0 MB  |
memory24: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory23: |     0 MB  |     0 MB  |
memory22: |     0 MB  |     0 MB  |
----------+-----------------------+
memory21: |  8192 MB  |  8192 MB  |
memory20: |     0 MB  |     0 MB  |
----------+-----------------------+
memory19: |     0 MB  |     0 MB  |
memory18: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory17: |     0 MB  |     0 MB  |
memory16: |     0 MB  |     0 MB  |
----------+-----------------------+
memory15: |  8192 MB  |  8192 MB  |
memory14: |     0 MB  |     0 MB  |
----------+-----------------------+
memory13: |     0 MB  |     0 MB  |
memory12: |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory11: |     0 MB  |     0 MB  |
memory10: |     0 MB  |     0 MB  |
----------+-----------------------+
memory9:  |  8192 MB  |  8192 MB  |
memory8:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory7:  |     0 MB  |     0 MB  |
memory6:  |  8192 MB  |  8192 MB  |
----------+-----------------------+
memory5:  |     0 MB  |     0 MB  |
memory4:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory3:  |  8192 MB  |  8192 MB  |
memory2:  |     0 MB  |     0 MB  |
----------+-----------------------+
memory1:  |     0 MB  |     0 MB  |
memory0:  |  8192 MB  |  8192 MB  |
----------+-----------------------+

Total sum of 256 GB.

As there's no reliable way to credit DIMMS to the right memory
controller, just put everything on memory controller 0 (with should
always exist).

Signed-off-by: Mauro Carvalho Chehab &lt;mchehab@redhat.com&gt;
</content>
</entry>
</feed>
