aboutsummaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)Author
2014-04-02i7300_edac: Fix device reference countJean Delvare
commit 75135da0d68419ef8a925f4c1d5f63d8046e314d upstream. pci_get_device() decrements the reference count of "from" (last argument) so when we break off the loop successfully we have only one device reference - and we don't know which device we have. If we want a reference to each device, we must take them explicitly and let the pci_get_device() walk complete to avoid duplicate references. This is serious, as over-putting device references will cause the device to eventually disappear. Without this fix, the kernel crashes after a few insmod/rmmod cycles. Tested on an Intel S7000FC4UR system with a 7300 chipset. Signed-off-by: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <bp@suse.de> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-04-02i7core_edac: Fix PCI device reference countJean Delvare
commit c0f5eeed0f4cef4f05b74883a7160e7edde58b6a upstream. The reference count changes done by pci_get_device can be a little misleading when the usage diverges from the most common scheme. The reference count of the device passed as the last parameter is always decreased, even if the function returns no new device. So if we are going to try alternative device IDs, we must manually increment the device reference count before each retry. If we don't, we end up decreasing the reference count, and after a few modprobe/rmmod cycles the PCI devices will vanish. In other words and as Alan put it: without this fix the EDAC code corrupts the PCI device list. This fixes kernel bug #50491: https://bugzilla.kernel.org/show_bug.cgi?id=50491 Signed-off-by: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare Reviewed-by: Alan Cox <alan@linux.intel.com> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-04-02EDAC: Correct workqueue setup pathBorislav Petkov
commit cb6ef42e516cb8948f15e4b70dc03af8020050a2 upstream. We're using edac_mc_workq_setup() both on the init path, when we load an edac driver and when we change the polling period (edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec. On that second path we don't need to init the workqueue which has been initialized already. Thanks to Tejun for workqueue insights. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-02-06EDAC: Test correct variable in ->store functionDan Carpenter
commit 8024c4c0b1057d1cd811fc9c3f88f81de9729fcd upstream. We're testing for ->show but calling ->store(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03i82975x_edac: Fix dimm label initializationMauro Carvalho Chehab
commit 479696840239e0cc43efb3c917bdcad2174d2215 upstream. The driver has only 4 hardcoded labels, but allows much more memory. Fix it by removing the hardcoded logic, using snprintf() instead. [ 19.833972] general protection fault: 0000 [#1] SMP [ 19.837733] Modules linked in: i82975x_edac(+) edac_core firewire_ohci firewire_core crc_itu_t nouveau mxm_wmi wmi video i2c_algo_bit drm_kms_helper ttm drm i2c_core [ 19.837733] CPU 0 [ 19.837733] Pid: 390, comm: udevd Not tainted 3.6.1-1.fc17.x86_64.debug #1 Dell Inc. Precision WorkStation 390 /0MY510 [ 19.837733] RIP: 0010:[<ffffffff813463a8>] [<ffffffff813463a8>] strncpy+0x18/0x30 [ 19.837733] RSP: 0018:ffff880078535b68 EFLAGS: 00010202 [ 19.837733] RAX: ffff880069fa9708 RBX: ffff880078588000 RCX: ffff880069fa9708 [ 19.837733] RDX: 000000000000001f RSI: 5f706f5f63616465 RDI: ffff880069fa9708 [ 19.837733] RBP: ffff880078535b68 R08: ffff880069fa9727 R09: 000000000000fffe [ 19.837733] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003 [ 19.837733] R13: 0000000000000000 R14: ffff880069fa9290 R15: ffff880079624a80 [ 19.837733] FS: 00007f3de01ee840(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000 [ 19.837733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 19.837733] CR2: 00007f3de00b9000 CR3: 0000000078dbc000 CR4: 00000000000007f0 [ 19.837733] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 19.837733] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 19.837733] Process udevd (pid: 390, threadinfo ffff880078534000, task ffff880079642450) [ 19.837733] Stack: [ 19.837733] ffff880078535c18 ffffffffa017c6b8 00040000816d627f ffff880079624a88 [ 19.837733] ffffc90004cd6000 ffff880079624520 ffff88007ac21148 0000000000000000 [ 19.837733] 0000000000000000 0004000000000000 feda000078535bc8 ffffffff810d696d [ 19.837733] Call Trace: [ 19.837733] [<ffffffffa017c6b8>] i82975x_init_one+0x2e6/0x3e6 [i82975x_edac] ... Fix bug reported at: https://bugzilla.redhat.com/show_bug.cgi?id=848149 And, very likely: https://bbs.archlinux.org/viewtopic.php?id=148033 https://bugzilla.kernel.org/show_bug.cgi?id=47171 Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> [bwh: Backported to 3.2: - Adjust context - Use csrow->channels[chan].label not csrow->channels[chan]->dimm->label] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2013-01-03i7300_edac: Fix error flag testingJean Delvare
commit 7e06b7a3333f5c7a0cec12aff20d39c5c87c0795 upstream. * Right-shift the values in GET_FBD_FAT_IDX and GET_FBD_NF_IDX, so that the callers get the result they expect. * Fix definition of FERR_FAT_FBD_ERR_MASK. * Call GET_FBD_NF_IDX, not GET_FBD_FAT_IDX, when operating on register FERR_NF_FBD. We were lucky they have the same definition. This fixes kernel bug #44131: https://bugzilla.kernel.org/show_bug.cgi?id=44131 Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-30amd64_edac:__amd64_set_scrub_rate(): avoid overindexing scrubrates[]Andrew Morton
commit 168bfeef7bba3f9784f7540b053e4ac72b769ce9 upstream. If none of the elements in scrubrates[] matches, this loop will cause __amd64_set_scrub_rate() to incorrectly use the n+1th element. As the function is designed to use the final scrubrates[] element in the case of no match, we can fix this bug by simply terminating the array search at the n-1th element. Boris: this code is fragile anyway, see here why: http://marc.info/?l=linux-kernel&m=135102834131236&w=2 It will be rewritten more robustly soonish. Reported-by: Denis Kirjanov <kirjanov@gmail.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-10-10sb_edac: Avoid overflow errors at memory size calculationMauro Carvalho Chehab
commit deb09ddaff1435f72dd598d38f9b58354c68a5ec upstream. Sandy bridge EDAC is calculating the memory size with overflow. Basically, the size field and the integer calculation is using 32 bits. More bits are needed, when the DIMM memories have high density. The net result is that memories are improperly reported there, when high-density DIMMs are used: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 0, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 1, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 As the number of pages value is handled at the EDAC core as unsigned ints, the driver shows the 16 GB memories at sysfs interface as 16760832 MB! The fix is simple: calculate the number of pages as unsigned 64-bits integer. After the patch, the memory size (16 GB) is properly detected: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> [bwh: Backported to 3.2: - Adjust context - Debug log function is debugf0(), not edac_dbg()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-08-04x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'Kevin Winchester
commit 141168c36cdee3ff23d9c7700b0edc47cb65479f and commit 3f806e50981825fa56a7f1938f24c0680816be45 upstream. Several fields in struct cpuinfo_x86 were not defined for the !SMP case, likely to save space. However, those fields still have some meaning for UP, and keeping them allows some #ifdef removal from other files. The additional size of the UP kernel from this change is not significant enough to worry about keeping up the distinction: text data bss dec hex filename 4737168 506459 972040 6215667 5ed7f3 vmlinux.o.before 4737444 506459 972040 6215943 5ed907 vmlinux.o.after for a difference of 276 bytes for an example UP config. If someone wants those 276 bytes back badly then it should be implemented in a cleaner way. Signed-off-by: Kevin Winchester <kjwinchester@gmail.com> Cc: Steffen Persvold <sp@numascale.com> Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-07-04edac: avoid mce decoding crash after edac driver unloadedChen Gong
commit e35fca4791fcdd43dc1fd769797df40c562ab491 upstream. Some edac drivers register themselves as mce decoders via notifier_chain. But in current notifier_chain implementation logic, it doesn't accept same notifier registered twice. If so, it will be wrong when adding/removing the element from the list. For example, on one SandyBridge platform, remove module sb_edac and then trigger one error, it will hit oops because it has no mce decoder registered but related notifier_chain still points to an invalid callback function. Here is an example: Call Trace: [<ffffffff8150ef6a>] atomic_notifier_call_chain+0x1a/0x20 [<ffffffff8102b936>] mce_log+0x46/0x180 [<ffffffff8102eaea>] apei_mce_report_mem_error+0x4a/0x60 [<ffffffff812e19d2>] ghes_do_proc+0x192/0x210 [<ffffffff812e2066>] ghes_proc+0x46/0x70 [<ffffffff812e20d8>] ghes_notify_sci+0x48/0x80 [<ffffffff8150ef05>] notifier_call_chain+0x55/0x80 [<ffffffff81076f1a>] __blocking_notifier_call_chain+0x5a/0x80 [<ffffffff812aea11>] ? acpi_os_wait_events_complete+0x23/0x23 [<ffffffff81076f56>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff812ddc4d>] acpi_hed_notify+0x19/0x1b [<ffffffff812b16bd>] acpi_device_notify+0x19/0x1b [<ffffffff812beb38>] acpi_ev_notify_dispatch+0x67/0x7f [<ffffffff812aea3a>] acpi_os_execute_deferred+0x29/0x36 [<ffffffff81069dc2>] process_one_work+0x132/0x450 [<ffffffff8106bbcb>] worker_thread+0x17b/0x3c0 [<ffffffff8106ba50>] ? manage_workers+0x120/0x120 [<ffffffff81070aee>] kthread+0x9e/0xb0 [<ffffffff81514724>] kernel_thread_helper+0x4/0x10 [<ffffffff81070a50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff81514720>] ? gs_change+0x13/0x13 Code: f3 49 89 d4 45 85 ed 4d 89 c6 48 8b 0f 74 48 48 85 c9 75 17 eb 41 0f 1f 80 00 00 00 00 41 83 ed 01 4c 89 f9 74 22 4d 85 ff 74 1d <4c> 8b 79 08 4c 89 e2 48 89 de 48 89 cf ff 11 4d 85 f6 74 04 41 RIP [<ffffffff8150eef6>] notifier_call_chain+0x46/0x80 RSP <ffff88042868fb20> CR2: ffffffffa01af838 ---[ end trace 0100930068e73e6f ]--- BUG: unable to handle kernel paging request at fffffffffffffff8 IP: [<ffffffff810705b0>] kthread_data+0x10/0x20 PGD 1a0d067 PUD 1a0e067 PMD 0 Oops: 0000 [#2] SMP Only i7core_edac and sb_edac have such issues because they have more than one memory controller which means they have to register mce decoder many times. Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> [bwh: Backported to 3.2: drivers call atomic_notifier_chain_{,un}register() directly] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2011-11-24drivers/edac/mpc85xx_edac.c: fix memory controller compatible for edacShaohui Xie
compatible in dts has been changed, so the driver needs to be updated accordingly. Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2011-11-06Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
2011-11-06Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits) powerpc/p3060qds: Add support for P3060QDS board powerpc/83xx: Add shutdown request support to MCU handling on MPC8349 MITX powerpc/85xx: Make kexec to interate over online cpus powerpc/fsl_booke: Fix comment in head_fsl_booke.S powerpc/85xx: issue 15 EOI after core reset for FSL CoreNet devices powerpc/8xxx: Fix interrupt handling in MPC8xxx GPIO driver powerpc/85xx: Add 'fsl,pq3-gpio' compatiable for GPIO driver powerpc/86xx: Correct Gianfar support for GE boards powerpc/cpm: Clear muram before it is in use. drivers/virt: add ioctl for 32-bit compat on 64-bit to fsl-hv-manager powerpc/fsl_msi: add support for "msi-address-64" property powerpc/85xx: Setup secondary cores PIR with hard SMP id powerpc/fsl-booke: Fix settlbcam for 64-bit powerpc/85xx: Adding DCSR node to dtsi device trees powerpc/85xx: clean up FPGA device tree nodes for Freecsale QorIQ boards powerpc/85xx: fix PHYS_64BIT selection for P1022DS powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map powerpc: respect mem= setting for early memory limit setup powerpc: Update corenet64_smp_defconfig powerpc: Update mpc85xx/corenet 32-bit defconfigs ... Fix up trivial conflicts in: - arch/powerpc/configs/40x/hcu4_defconfig removed stale file, edited elsewhere - arch/powerpc/include/asm/udbg.h, arch/powerpc/kernel/udbg.c: added opal and gelic drivers vs added ePAPR driver - drivers/tty/serial/8250.c moved UPIO_TSI to powerpc vs removed UPIO_DWAPB support
2011-11-03edac: Only build sb_edac on 64-bitJosh Boyer
The sb_edac driver is marginally useful on a 32-bit kernel, and currently has 64-bit divide compile errors when building that config. For now, make this build on only for 64-bit kernels. Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02Merge branch 'linux_next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (21 commits) MAINTAINERS: add an entry for Edac Sandy Bridge driver edac: tag sb_edac as EXPERIMENTAL, as it requires more testing EDAC: Fix incorrect edac mode reporting in sb_edac edac: sb_edac: Add it to the building system edac: Add an experimental new driver to support Sandy Bridge CPU's i7300_edac: Fix error cleanup logic i7core_edac: Initialize memory name with cpu, channel, bank i7core_edac: Fix compilation on 32 bits arch i7core_edac: scrubbing fixups EDAC: Correct Kconfig dependencies i7core_edac: return -ENODEV if no MC is found i7core_edac: use edac's own way to print errors MAINTAINERS: remove dropped edac_mce.* from the file i7core_edac: Drop the edac_mce facility x86, MCE: Use notifier chain only for MCE decoding EDAC i7core: Use mce socketid for better compatibility i7core_edac: Don't enable memory scrubbing for Xeon 35xx i7core_edac: Add scrubbing support edac: Move edac main structs to include/linux/edac.h i7core_edac: Fix oops when trying to inject errors ...
2011-11-01edac: tag sb_edac as EXPERIMENTAL, as it requires more testingMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01EDAC: Fix incorrect edac mode reporting in sb_edacMark A. Grondona
The edac driver for Sandy Bridge was found to be reporting "FPM" for edac_mode, which clearly doesn't make sense. It was found that sb_edac.c:get_dimm_config was reusing a variable for both mem_type and edac_type, and thus was overwriting the value after setting it correctly. This patch fixes that issue. Before the patch: /sys/devices/system/edac/mc/mc0/csrow0/edac_mode:FPM /sys/devices/system/edac/mc/mc0/csrow1/edac_mode:FPM /sys/devices/system/edac/mc/mc0/csrow2/edac_mode:FPM /sys/devices/system/edac/mc/mc0/csrow3/edac_mode:FPM After: /sys/devices/system/edac/mc/mc0/csrow0/edac_mode:S4ECD4ED /sys/devices/system/edac/mc/mc0/csrow1/edac_mode:S4ECD4ED /sys/devices/system/edac/mc/mc0/csrow2/edac_mode:S4ECD4ED /sys/devices/system/edac/mc/mc0/csrow3/edac_mode:S4ECD4ED Signed-off-by: Mark A. Grondona <mgrondona@llnl.gov> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01edac: sb_edac: Add it to the building systemMauro Carvalho Chehab
Some changes on it were required due to changeset cd90cc84c6bf0, that changed the glue with the MCE logic. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01edac: Add an experimental new driver to support Sandy Bridge CPU'sMauro Carvalho Chehab
This driver is known to work on mine and Tony's test environments, using software error injection, and a partial hardware/software error injection tool. There's no broader range test yet to double check if the error decoding logic will actually point to the right DIMM, so use it with care. More tests are required to be sure that the driver will work on all different types of memory configurations. If you're willing to risk using it, I suggest you to enable EDAC debugs for your test machines, as the debug logs helps to track what's going inside the driver. Please feed me with bug reports, if you notice that the driver is miss-behaving. Tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01i7300_edac: Fix error cleanup logicMauro Carvalho Chehab
The error cleanup logic was broken. Due to that, one error is generated for every error polling. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01i7core_edac: Initialize memory name with cpu, channel, bankMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01i7core_edac: Fix compilation on 32 bits archSedat Dilek
on i386: ERROR: "__udivdi3" [drivers/edac/i7core_edac.ko] undefined!\ In both get_sdram_scrub_rate() and set_sdram_scrub_rate() Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01i7core_edac: scrubbing fixupsNils Carlson
Get a more reliable DCLK value from DMI, name the SCRUBINTERVAL mask and guard against potential overflow in the scrub rate computations. Signed-off-by: Nils Carlson <nils.carlson@ericsson.com>
2011-11-01EDAC: Correct Kconfig dependenciesBorislav Petkov
Both AMD and Intel i7 EDAC drivers use MCE features and are thus dependent of this functionality present in the kernel. Express this in Kconfig so that randconfig builds don't break. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01i7core_edac: return -ENODEV if no MC is foundMauro Carvalho Chehab
Nehalem-EX uses a different memory controller. However, as the memory controller is not visible on some Nehalem/Nehalem-EP, we need to indirectly probe via a X58 PCI device. The same devices are found on (some) Nehalem-EX. So, on those machines, the probe routine needs to return -ENODEV, as the actual Memory Controller registers won't be detected. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01i7core_edac: use edac's own way to print errorsMauro Carvalho Chehab
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01i7core_edac: Drop the edac_mce facilityBorislav Petkov
Remove edac_mce pieces and use the normal MCE decoder notifier chain by retaining the same functionality with considerably less code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31drivers/edac: Add module.h to mce_amd_inj.cPaul Gortmaker
This file really needs the full module.h header file present, but was just getting it implicitly before. Fix it up in advance so we avoid build failures once the cleanup commit is present. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31EDAC i7core: Use mce socketid for better compatibilityThomas Renninger
mce->socketid and cpu_data(mce->cpu).phys_proc_id are the same, compare with mce_setup (in mce.c): m->cpu = m->extcpu = smp_processor_id(); ... m->socketid = cpu_data(m->extcpu).phys_proc_id; This makes it easier for example for XEN patches to hook into the MCE subsystem. Compile tested on x86_64. Signed-off-by: Thomas Renninger <trenn@suse.de> CC: JBeulich@novell.com CC: linux-edac@vger.kernel.org CC: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31i7core_edac: Don't enable memory scrubbing for Xeon 35xxMauro Carvalho Chehab
Xeon 35xx doesn't mention memory scrub. It seems that only Xeon 55xx and above supports it. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31i7core_edac: Add scrubbing supportSamuel Gabrielsson
Add scrubbing support to i7core_edac, tested on intel Xeon L5638. Signed-off-by: Samuel Gabrielsson <samuel.gabrielsson@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31edac: Move edac main structs to include/linux/edac.hMauro Carvalho Chehab
As we'll need to use those structs for trace functions, they should be on a more public place. So, move struct mem_ctl_info & friends to edac.h. No functional changes on this patch. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com>
2011-10-31i7core_edac: Fix oops when trying to inject errorsMauro Carvalho Chehab
Error injection needs the pci device 0:0. So, we need to revert this changeset: 79daef2099a02fed35747c23bad22f30441133ea. Tests need to be made to be sure that refcount won't be wrong as noticed before. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31i7core_edac: fix misuse of logical operation in place of bitopDavid Sterba
CC: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-06amd64_edac: Cleanup return type of amd64_determine_edac_cap()Dan Carpenter
Sparse complains that edac_cap was declared as dev_type and we are returning edac_type. Historically, edac_type was correct but since then we have changed it to return a bit field. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: http://lkml.kernel.org/r/20111006063025.GA2615@mwanda Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06amd64_edac: Add a fix for Erratum 505Borislav Petkov
When accessing the scrub rate control register (F3x58) on F15h, the DRAM controller selector (F1x10C[DctCfgSel]) has to point to DCT0 so that the scrub rate configuration can take effect. See Erratum 505 in the AMD F15h revision guide for more details. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06EDAC, MCE, AMD: Simplify NB MCE decoder interfaceBorislav Petkov
Drop third nbcfg argument which is old remains and not required anymore. No functionality change. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06EDAC, MCE, AMD: Drop local coreid reportingBorislav Petkov
MCE decoding code is reporting the core which encountered the error unconditionally now so drop this piece. Besides, it reported the coreid in the local processor package which is not that valuable as a datapoint. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06EDAC, MCE, AMD: Print valid addr when reporting an errorBorislav Petkov
The MCi_STATUS bank has a AddrV bit which, when set, denotes that the corresponding MCi_ADDR MSR contains a valid address belonging to the MCE currently being reported. Dump it since it is definitely relevant information. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06EDAC, MCE, AMD: Print CPU number when reporting the errorBorislav Petkov
Currently, correctable ECCs go through mcelog and do not print the scary MCE banner. In that case, however, reporting the core where the CECC happened is important information so dump it along with the decoded string albeit at risk of having a minor redundancy. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-09-01cpc925_edac: Support single-processor configurationsDmitry Eremin-Solenikov
If second CPU is not enabled, CPC925 EDAC driver will spill out warnings about errors on second Processor Interface. Support masking that out, by detecting at runtime which CPUs are present in device tree. Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-08-30Merge remote-tracking branch 'jwb/next' into nextBenjamin Herrenschmidt
2011-08-18i7core_edac: fixed typo in error count calculationMathias Krause
Based on a patch from the PaX Team, found during a clang analysis pass. Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: PaX Team <pageexec@freemail.hu> Cc: stable@kernel.org [v2.6.35+] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11powerpc/4xx: edac: Add comma to fix build errorMike Williams
Commit 4018294b53d1dae026880e45f174c1cc63b5d435 broke the ppc4xx_edac driver at line 210 where the struct member is missing a comma. Signed-off-by: Mike Williams <mike@mikebwilliams.com> Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-08-11Revert "EDAC: Correct Kconfig dependencies"Linus Torvalds
This reverts commit af9d220bac41dc3201893e1601cc7c44f7da4498. It turns out that one was meant to be applied on top of the edac.git tree in -next that has more i7core_edac changes, but that wasn't clear in the original email. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-10EDAC: Correct Kconfig dependenciesBorislav Petkov
Both AMD and Intel i7 EDAC drivers use MCE features and are thus dependent of this functionality present in the kernel. Express this in Kconfig so that randconfig builds don't break. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26atomic: use <linux/atomic.h>Arun Sharma
This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26drivers/edac/mpc85xx_edac.c: correct offset_in_page mask bits in ↵Kai.Jiang
edac_mc_handle_ce() Parameter offset_in_page in edac_mc_handle_ce() should mask the higher bits above the page size, not the lower bits. The original input sometimes causes a crash. Signed-off-by: Kai.Jiang <Kai.Jiang@freescale.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Anton Vorontsov <avorontsov@mvista.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: "David S. Miller" <davem@davemloft.net> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-10treewide: Convert uses of struct resource to resource_size(ptr)Joe Perches
Several fixes as well where the +1 was missing. Done via coccinelle scripts like: @@ struct resource *ptr; @@ - ptr->end - ptr->start + 1 + resource_size(ptr) and some grep and typing. Mostly uncompiled, no cross-compilers. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-05-26edac,rcu: use synchronize_rcu() instead of call_rcu()+rcu_barrier()Lai Jiangshan
synchronize_rcu() does the stuff as needed. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>