Age | Commit message (Collapse) | Author |
|
commit 9d0e8d8348d54d60005c6c938b87b94648005d1c upstream.
Update current channel selection logic to include F15h, M30h memory
controllers.
Refer F15 M30h BKDG D18F2x110[7:6] (DRAM Controller Select Low)
(Link:http://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf)
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1390338216-3873-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit 75135da0d68419ef8a925f4c1d5f63d8046e314d upstream.
pci_get_device() decrements the reference count of "from" (last
argument) so when we break off the loop successfully we have only one
device reference - and we don't know which device we have. If we want
a reference to each device, we must take them explicitly and let
the pci_get_device() walk complete to avoid duplicate references.
This is serious, as over-putting device references will cause
the device to eventually disappear. Without this fix, the kernel
crashes after a few insmod/rmmod cycles.
Tested on an Intel S7000FC4UR system with a 7300 chipset.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit c0f5eeed0f4cef4f05b74883a7160e7edde58b6a upstream.
The reference count changes done by pci_get_device can be a little
misleading when the usage diverges from the most common scheme. The
reference count of the device passed as the last parameter is always
decreased, even if the function returns no new device. So if we are
going to try alternative device IDs, we must manually increment the
device reference count before each retry. If we don't, we end up
decreasing the reference count, and after a few modprobe/rmmod cycles
the PCI devices will vanish.
In other words and as Alan put it: without this fix the EDAC code
corrupts the PCI device list.
This fixes kernel bug #50491:
https://bugzilla.kernel.org/show_bug.cgi?id=50491
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare
Reviewed-by: Alan Cox <alan@linux.intel.com>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
|
|
commit cb6ef42e516cb8948f15e4b70dc03af8020050a2 upstream.
We're using edac_mc_workq_setup() both on the init path, when
we load an edac driver and when we change the polling period
(edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec.
On that second path we don't need to init the workqueue which has been
initialized already.
Thanks to Tejun for workqueue insights.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 9da21b1509d8aa7ab4846722817d16c72d656c91 upstream.
Sanitize code even more to accept unsigned longs only and to not allow
polling intervals below 1 second as this is unnecessary and doesn't make
much sense anyway for polling errors.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 79040cad3f8235937e229f1b9401ba36dd5ad69b upstream.
If you do
echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec
the following stack trace is output because the edac module is not
designed to poll with a timeout of zero.
WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0()
list_add corruption. prev->next should be next (ffff8808291dd1b8), but was (null). (prev=ffff8808286fe3f8).
Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1
Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Call Trace:
<IRQ>
__list_add+0xac/0xc0
__internal_add_timer+0xab/0x130
internal_add_timer+0x17/0x40
mod_timer_pinned+0xca/0x170
intel_pstate_timer_func+0x28a/0x380
call_timer_fn+0x36/0x100
run_timer_softirq+0x1ff/0x2f0
__do_softirq+0xf5/0x2e0
irq_exit+0x10d/0x120
smp_apic_timer_interrupt+0x45/0x60
apic_timer_interrupt+0x6d/0x80
<EOI>
cpuidle_idle_call+0xb9/0x1f0
arch_cpu_idle+0xe/0x30
cpu_startup_entry+0x9e/0x240
start_secondary+0x1e4/0x290
kernel BUG at kernel/timer.c:1084!
invalid opcode: 0000 [#1] SMP
Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 0 Comm: swapper/12 Tainted: G W 3.13.0+ #1
Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Call Trace:
<IRQ>
run_timer_softirq+0x245/0x2f0
__do_softirq+0xf5/0x2e0
irq_exit+0x10d/0x120
smp_apic_timer_interrupt+0x45/0x60
apic_timer_interrupt+0x6d/0x80
<EOI>
cpuidle_idle_call+0xb9/0x1f0
arch_cpu_idle+0xe/0x30
cpu_startup_entry+0x9e/0x240
start_secondary+0x1e4/0x290
RIP cascade+0x93/0xa0
WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0()
Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G W 3.13.0+ #1
Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Workqueue: edac-poller edac_mc_workq_function [edac_core]
Call Trace:
dump_stack+0x45/0x56
warn_slowpath_common+0x7d/0xa0
warn_slowpath_null+0x1a/0x20
__queue_delayed_work+0xed/0x1a0
queue_delayed_work_on+0x27/0x50
edac_mc_workq_function+0x72/0xa0 [edac_core]
process_one_work+0x17b/0x460
worker_thread+0x11b/0x400
kthread+0xd2/0xf0
ret_from_fork+0x7c/0xb0
This patch adds a range check in the edac_mc_poll_msec code to check for 0.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit 90ed4988b8c030d65b41b7d13140e9376dc6ec5a upstream.
In case the device 0, function 1 is not found using pci_get_device(),
pci_scan_single_device() will be used but, differently than
pci_get_device(), it allocates a pci_dev but doesn't does bump the usage
count on the pci_dev and after few module removals and loads the pci_dev
will be freed.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Reviewed-by: mark gross <mark.gross@intel.com>
Link: http://lkml.kernel.org/r/20131205153755.GL4545@redhat.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
commit a72b8859fd3941cc1d2940d5c43026d2c6fb959e upstream.
Register and enable interrupts after the edac registration. Otherwise
incomming ecc error interrupts lead to crashes during device setup.
Fixing this in drivers for mc and l2.
Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Robert Richter <rric@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull Tile arch updates from Chris Metcalf:
"These changes bring in a bunch of new functionality that has been
maintained internally at Tilera over the last year, plus other stray
bits of work that I've taken into the tile tree from other folks.
The changes include some PCI root complex work, interrupt-driven
console support, support for performing fast-path unaligned data
fixups by kernel-based JIT code generation, CONFIG_PREEMPT support,
vDSO support for gettimeofday(), a serial driver for the tilegx
on-chip UART, KGDB support, more optimized string routines, support
for ftrace and kprobes, improved ASLR, and many bug fixes.
We also remove support for the old TILE64 chip, which is no longer
buildable"
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (85 commits)
tile: refresh tile defconfig files
tile: rework <asm/cmpxchg.h>
tile PCI RC: make default consistent DMA mask 32-bit
tile: add null check for kzalloc in tile/kernel/setup.c
tile: make __write_once a synonym for __read_mostly
tile: remove support for TILE64
tile: use asm-generic/bitops/builtin-*.h
tile: eliminate no-op "noatomichash" boot argument
tile: use standard tile_bundle_bits type in traps.c
tile: simplify code referencing hypervisor API addresses
tile: change <asm/system.h> to <asm/switch_to.h> in comments
tile: mark pcibios_init() as __init
tile: check for correct compiler earlier in asm-offsets.c
tile: use standard 'generic-y' model for <asm/hw_irq.h>
tile: use asm-generic version of <asm/local64.h>
tile PCI RC: add comment about "PCI hole" problem
tile: remove DEBUG_EXTRA_FLAGS kernel config option
tile: add virt_to_kpte() API and clean up and document behavior
tile: support FRAME_POINTER
tile: support reporting Tilera hypervisor statistics
...
|
|
dct_base and dct_limit obtain 32 bit register values when they read
their respective pci config space registers. A left shift beyond 32 bits
will cause them to wrap around. Similar case for chan_addr as can be
seen from the bug report (link below). In the patch, we rectify this by
casting chan_addr to u64 and by comparing dct_base and dct_limit against
properly shifted sys_addr in order to compare the correct bits.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819132302.GA12171@elgon.mountain
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Basically we want to cover all 0x0-0xf models, i.e. Orochi and later.
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/20130819192321.GF4165@pd.tnic
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras
Pull RAS/EDAC updates from Boris Petkov:
"An amd64_edac fix for single channel configurations + trivial cleanups
courtesy of Jingoo Han."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The struct should be terminated by using empty braces in order to
fix the following sparse warning.
drivers/edac/cpc925_edac.c:792:10: warning: Using plain integer as NULL pointer
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
[ drop obvious comment ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Now that we cache (family, model, stepping) locally, use them instead of
boot_cpu_data.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
On newer models, support has been included for upto 4 DCT's, however,
only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10).
Also, the routing DRAM Requests algorithm is different for F15h M30h.
Thus it is cleaner to use a brand new function rather than adding quirks
to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM
Routing Requests" in the BKDG for further info.
Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and
verified to be functionally correct.
While at it, verify if erratum workarounds for E505 and E637 still hold.
From email conversations within AMD, the current status of the errata
is:
* Erratum 505: fixed in model 0x1, stepping 0x1 and later.
* Erratum 637: not fixed.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Cleanups, corrections ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Make a local function static in order to fix the following sparse
warning:
drivers/edac/x38_edac.c:252:14: warning: symbol 'x38_map_mchbar' was not declared. Should it be static?
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
[ Boris: Correct commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
This local symbol is used only in this file.
Fix the following sparse warnings:
drivers/edac/i3200_edac.c:264:14: warning: symbol 'i3200_map_mchbar' was not declared. Should it be static?
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
It can happen that configurations are running in a single-channel mode
even with a dual-channel memory controller, by, say, putting the DIMMs
only on the one channel and leaving the other empty. This causes a
problem in init_csrows which implicitly assumes that when the second
channel is enabled, i.e. channel 1, the struct dimm hierarchy will be
present. Which is not.
So always allocate two channels unconditionally.
This provides for the nice side effect that the data structures are
initialized so some day, when memory hotplug is supported, it should
just work out of the box when all of a sudden a second channel appears.
Reported-and-tested-by: Roger Leigh <rleigh@debian.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
The usage of strict_strtol() is not preferred, because strict_strtol()
is obsolete. Thus, kstrtol() should be used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Fix the following:
BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
dump_stack
warn_slowpath_common
warn_slowpath_fmt
lockdep_init_map
? trace_hardirqs_on_caller
? trace_hardirqs_on
debug_mutex_init
__mutex_init
bus_register
edac_create_sysfs_mci_device
edac_mc_add_mc
sbridge_probe
pci_device_probe
driver_probe_device
__driver_attach
? driver_probe_device
bus_for_each_dev
driver_attach
bus_add_driver
driver_register
__pci_register_driver
? 0xffffffffa0010fff
sbridge_init
? 0xffffffffa0010fff
do_one_initcall
load_module
? unset_module_init_ro_nx
SyS_init_module
tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.
What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.
Fix this by using a statically allocated struct bus_type for the MC bus.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org # v3.10
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
Commit 0998d06310 (device-core: Ensure drvdata = NULL when no
driver is bound) removes the need to set driver data field to
NULL.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
|
|
Pull MIPS updates from Ralf Baechle:
"MIPS updates:
- All the things that didn't make 3.10.
- Removes the Windriver PPMC platform. Nobody will miss it.
- Remove a workaround from kernel/irq/irqdomain.c which was there
exclusivly for MIPS. Patch by Grant Likely.
- More small improvments for the SEAD 3 platform
- Improvments on the BMIPS / SMP support for the BCM63xx series.
- Various cleanups of dead leftovers.
- Platform support for the Cavium Octeon-based EdgeRouter Lite.
Two large KVM patchsets didn't make it for this pull request because
their respective authors are vacationing"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (124 commits)
MIPS: Kconfig: Add missing MODULES dependency to VPE_LOADER
MIPS: BCM63xx: CLK: Add dummy clk_{set,round}_rate() functions
MIPS: SEAD3: Disable L2 cache on SEAD-3.
MIPS: BCM63xx: Enable second core SMP on BCM6328 if available
MIPS: BCM63xx: Add SMP support to prom.c
MIPS: define write{b,w,l,q}_relaxed
MIPS: Expose missing pci_io{map,unmap} declarations
MIPS: Malta: Update GCMP detection.
Revert "MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET"
MIPS: APSP: Remove <asm/kspd.h>
SSB: Kconfig: Amend SSB_EMBEDDED dependencies
MIPS: microMIPS: Fix improper definition of ISA exception bit.
MIPS: Don't try to decode microMIPS branch instructions where they cannot exist.
MIPS: Declare emulate_load_store_microMIPS as a static function.
MIPS: Fix typos and cleanup comment
MIPS: Cleanup indentation and whitespace
MIPS: BMIPS: support booting from physical CPU other than 0
MIPS: Only set cpu_has_mmips if SYS_SUPPORTS_MICROMIPS
MIPS: GIC: Fix gic_set_affinity infinite loop
MIPS: Don't save/restore OCTEON wide multiplier state on syscalls.
...
|
|
Pull AMD EDAC update from Borislav Petkov:
"Add MCE signatures for family 0x15, models 30-3f"
* tag 'edac_for_3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, MCE, AMD: Add an MCE signature for new Fam15h models
EDAC: Replace strict_strtoul() with kstrtoul()
|
|
CAVIUM_OCTEON_SOC most place we used to use CPU_CAVIUM_OCTEON. This
allows us to CPU_CAVIUM_OCTEON in places where we have no OCTEON SOC.
Remove CAVIUM_OCTEON_SIMULATOR as it doesn't really do anything, we can
get the same configuration with CAVIUM_OCTEON_SOC.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: linux-ide@vger.kernel.org
Cc: linux-edac@vger.kernel.org
Cc: linux-i2c@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: spi-devel-general@lists.sourceforge.net
Cc: devel@driverdev.osuosl.org
Cc: linux-usb@vger.kernel.org
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Patchwork: https://patchwork.linux-mips.org/patch/5295/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
|
|
Add a new error signature for Family 15h, models 30h-3fh. Patch has been
tested on Fam15h using mce_amd_inj facility and has been verified to
work correctly.
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
[ cleanup commit message and error string ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
The usage of strict_strtoul() is not preferred, because strict_strtoul()
is obsolete. Thus, kstrtoul() should be used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Ever since commit 45f035ab9b8f ("CONFIG_HOTPLUG should be always on"),
it has been basically impossible to build a kernel with CONFIG_HOTPLUG
turned off. Remove all the remaining references to it.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix yet another issue caught by 8f46baaa7ec6c ("base: core: WARN() about
bogus permissions on device attributes").
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull two small EDAC fixes from Borislav Petkov.
* tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC: Don't give write permission to read-only files
EDAC, mc_sysfs.c: Fix string array pointer types
|
|
I get the following warning on boot:
------------[ cut here ]------------
WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0()
Hardware name: -[8737R2A]-
Write permission without 'store'
...
</snip>
Drilling down, this is related to dynamic channel ce_count attribute
files sporting a S_IWUSR mode without a ->store() function. Looking
around, it appears that they aren't supposed to have a ->store()
function. So remove the bogus write permission to get rid of the
warning.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: <stable@vger.kernel.org> # 3.[89]
[ shorten commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull edac fixes from Mauro Carvalho Chehab:
"Two edac fixes:
- i7300_edac currently reports a wrong number of DIMMs when the
memory controller is in single channel mode
- on some Sandy Bridge machines, the EDAC driver bails out as one of
the PCI IDs used by the driver is hidden by BIOS. As the driver
uses it only to detect the type of memory, make it optional at the
driver"
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
edac: sb_edac.c should not require prescence of IMC_DDRIO device
i7300_edac: Fix memory detection in single mode
|
|
The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR
space to determine the type of DIMMs (registered or unregistered).
But this device does not exist on some single socket Sandy Bridge
servers. While the type of DIMMs is nice to know, it is not essential
for this driver's other functions. So it seems harsh to have it
refuse to load at all when it cannot find this device.
Make the check for this device be optional. If it isn't present
just report the memory type as "MEM_UNKNOWN".
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
When the machine is on single mode, only branch 0 channel 0
is valid. However, the code is not honouring it:
[ 1952.639341] EDAC DEBUG: i7300_get_mc_regs: Memory controller operating on single mode
...
[ 1952.639351] EDAC DEBUG: i7300_init_csrows: AMB-present CH0 = 0x1:
[ 1952.639353] EDAC DEBUG: i7300_init_csrows: AMB-present CH1 = 0x0:
[ 1952.639355] EDAC DEBUG: i7300_init_csrows: AMB-present CH2 = 0x0:
[ 1952.639358] EDAC DEBUG: i7300_init_csrows: AMB-present CH3 = 0x0:
...
[ 1952.639360] EDAC DEBUG: decode_mtr: MTR0 CH0: DIMMs are Present (mtr)
[ 1952.639362] EDAC DEBUG: decode_mtr: WIDTH: x8
[ 1952.639363] EDAC DEBUG: decode_mtr: ELECTRICAL THROTTLING is enabled
[ 1952.639364] EDAC DEBUG: decode_mtr: NUMBANK: 4 bank(s)
[ 1952.639366] EDAC DEBUG: decode_mtr: NUMRANK: single
[ 1952.639367] EDAC DEBUG: decode_mtr: NUMROW: 16,384 - 14 rows
[ 1952.639368] EDAC DEBUG: decode_mtr: NUMCOL: 1,024 - 10 columns
[ 1952.639370] EDAC DEBUG: decode_mtr: SIZE: 512 MB
[ 1952.639371] EDAC DEBUG: decode_mtr: ECC code is 8-byte-over-32-byte SECDED+ code
[ 1952.639373] EDAC DEBUG: decode_mtr: Scrub algorithm for x8 is on enhanced mode
[ 1952.639374] EDAC DEBUG: decode_mtr: MTR0 CH1: DIMMs are Present (mtr)
[ 1952.639376] EDAC DEBUG: decode_mtr: WIDTH: x8
[ 1952.639377] EDAC DEBUG: decode_mtr: ELECTRICAL THROTTLING is enabled
[ 1952.639379] EDAC DEBUG: decode_mtr: NUMBANK: 4 bank(s)
[ 1952.639380] EDAC DEBUG: decode_mtr: NUMRANK: single
[ 1952.639381] EDAC DEBUG: decode_mtr: NUMROW: 16,384 - 14 rows
[ 1952.639383] EDAC DEBUG: decode_mtr: NUMCOL: 1,024 - 10 columns
[ 1952.639384] EDAC DEBUG: decode_mtr: SIZE: 512 MB
[ 1952.639385] EDAC DEBUG: decode_mtr: ECC code is 8-byte-over-32-byte SECDED+ code
[ 1952.639387] EDAC DEBUG: decode_mtr: Scrub algorithm for x8 is on enhanced mode
...
[ 1952.639449] EDAC DEBUG: print_dimm_size: channel 0 | channel 1 | channel 2 | channel 3 |
[ 1952.639451] EDAC DEBUG: print_dimm_size: -------------------------------------------------------------
[ 1952.639453] EDAC DEBUG: print_dimm_size: csrow/SLOT 0 512 MB | 512 MB | 0 MB | 0 MB |
[ 1952.639456] EDAC DEBUG: print_dimm_size: csrow/SLOT 1 0 MB | 0 MB | 0 MB | 0 MB |
[ 1952.639458] EDAC DEBUG: print_dimm_size: csrow/SLOT 2 0 MB | 0 MB | 0 MB | 0 MB |
[ 1952.639460] EDAC DEBUG: print_dimm_size: csrow/SLOT 3 0 MB | 0 MB | 0 MB | 0 MB |
[ 1952.639462] EDAC DEBUG: print_dimm_size: csrow/SLOT 4 0 MB | 0 MB | 0 MB | 0 MB |
[ 1952.639464] EDAC DEBUG: print_dimm_size: csrow/SLOT 5 0 MB | 0 MB | 0 MB | 0 MB |
[ 1952.639466] EDAC DEBUG: print_dimm_size: csrow/SLOT 6 0 MB | 0 MB | 0 MB | 0 MB |
[ 1952.639468] EDAC DEBUG: print_dimm_size: csrow/SLOT 7 0 MB | 0 MB | 0 MB | 0 MB |
[ 1952.639470] EDAC DEBUG: print_dimm_size: -------------------------------------------------------------
Instead of detecting a single memory at channel 0, it is showing
twice the memory.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Add code to handle DRAM ECC errors decoding for Fam16h.
Tested on Fam16h with ECC turned on using the mce_amd_inj facility and
works fine.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Boris: cleanups and clarifications ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Those should be const ptr to a const string, fix them.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.
There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:
if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
per_rank = true;
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
We were filling the csrow size with a wrong value. 16a528ee3975 ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[ make it a per-csrow accounting regardless of ->channel_count ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Fixes lots of sparse warnings here.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC fixes and ghes-edac from Mauro Carvalho Chehab:
"For:
- Some fixes at edac drivers (i7core_edac, sb_edac, i3200_edac);
- error injection support for i5100, when EDAC debug is enabled;
- fix edac when it is loaded builtin (early init for the subsystem);
- a "Firmware First" EDAC driver, allowing ghes to report errors via
EDAC (ghes-edac).
With regards to ghes-edac, this fixes a longstanding BZ at Red Hat
that happens with Nehalem and Sandy Bridge CPUs: when both GHES and
i7core_edac or sb_edac are running, the error reports are
unpredictable, as both BIOS and OS race to access the registers. With
ghes-edac, the EDAC core will refuse to register any other concurrent
memory error driver.
This patchset moves the ghes struct definitions to a separate header
file (include/acpi/ghes.h) and adds 3 hooks at apei/ghes.c to
register/unregister and to report errors via ghes-edac. Those changes
were acked by ghes driver maintainer (Huang)."
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (30 commits)
i5100_edac: convert to use simple_open()
ghes_edac: fix to use list_for_each_entry_safe() when delete list items
ghes_edac: Fix RAS tracing
ghes_edac: Make it compliant with UEFI spec 2.3.1
ghes_edac: Improve driver's printk messages
ghes_edac: Don't credit the same memory dimm twice
ghes_edac: do a better job of filling EDAC DIMM info
ghes_edac: add support for reporting errors via EDAC
ghes_edac: Register at EDAC core the BIOS report
ghes: add the needed hooks for EDAC error report
ghes: move structures/enum to a header file
edac: add support for error type "Info"
edac: add support for raw error reports
edac: reduce stack pressure by using a pre-allocated buffer
edac: lock module owner to avoid error report conflicts
edac: remove proc_name from mci structure
edac: add a new memory layer type
edac: initialize the core earlier
edac: better report error conditions in debug mode
i5100_edac: Remove two checkpatch warnings
...
|
|
This removes an open coded simple_open() function and
replaces file operations references to the function
with simple_open() instead.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each_entry() macro aptly named
list_for_each_entry_safe().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
With the current version of CPER, there's no way to associate an
error with the memory error. So, the error location in EDAC
layers is unused.
As CPER has its own idea about memory architectural layers, just
output whatever is there inside the driver's detail at the RAS
tracepoint.
The EDAC location keeps untouched, in the case that, in some future,
we could actually map the error into the dimm labels.
Now, the error message:
[ 72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 72.396627] {1}[Hardware Error]: APEI generic hardware error status
[ 72.396628] {1}[Hardware Error]: severity: 2, corrected
[ 72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected
[ 72.396632] {1}[Hardware Error]: flags: 0x01
[ 72.396634] {1}[Hardware Error]: primary
[ 72.396635] {1}[Hardware Error]: section_type: memory error
[ 72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400
[ 72.396638] {1}[Hardware Error]: node: 3
[ 72.396639] {1}[Hardware Error]: card: 0
[ 72.396640] {1}[Hardware Error]: module: 0
[ 72.396641] {1}[Hardware Error]: device: 0
[ 72.396643] {1}[Hardware Error]: error_type: 18, unknown
[ 72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)
Is properly represented on the trace event:
kworker/0:2-584 [000] .... 72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location:-1:-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory)
Tested on a 4 sockets E5-4650 Sandy Bridge machine.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
The UEFI spec defines the memory error types ans the bits that
validate each field on the memory error record, at
Appendix N om items N.2.5 (Memory Error Section) and
N.2.11 (Error Status). Make the error description compliant with
it, only showing the valid fields.
The EDAC error log is now properly reporting the error:
[ 281.556854] mce: [Hardware Error]: Machine check events logged
[ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 281.557044] {2}[Hardware Error]: APEI generic hardware error status
[ 281.557046] {2}[Hardware Error]: severity: 2, corrected
[ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected
[ 281.557050] {2}[Hardware Error]: flags: 0x01
[ 281.557052] {2}[Hardware Error]: primary
[ 281.557053] {2}[Hardware Error]: section_type: memory error
[ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400
[ 281.557056] {2}[Hardware Error]: node: 3
[ 281.557057] {2}[Hardware Error]: card: 0
[ 281.557058] {2}[Hardware Error]: module: 1
[ 281.557059] {2}[Hardware Error]: device: 0
[ 281.557061] {2}[Hardware Error]: error_type: 18, unknown
[ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9
[ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory)
Tested on a 4 CPUs E5-4650 Sandy Bridge machine.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Provide a better infrastructure for printk's inside the driver:
- use edac_dbg() for debug messages;
- standardize the usage of pr_info();
- provide warning about the risk of relying on this
driver.
While here, changes the size of a fake memory to 1 page. This is
as good or as bad as 1000 pages, but it is easier for userspace to
detect, as I don't expect that any machine implementing GHES would
provide just 1 page available ;)
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Conflicts:
drivers/edac/ghes_edac.c
|
|
On my tests on a 4xE5-4650 CPU's system, the GHES
EDAC driver is called twice. As the SMBIOS DMI enumeration
call will seek for the entire DIMM sockets in the system, on
this machine, equipped with 128 GB of RAM, the memory is
displayed twice:
+-----------------------+
| mc0 | mc1 |
----------+-----------------------+
memory45: | 8192 MB | 8192 MB |
memory44: | 0 MB | 0 MB |
----------+-----------------------+
memory43: | 0 MB | 0 MB |
memory42: | 8192 MB | 8192 MB |
----------+-----------------------+
memory41: | 0 MB | 0 MB |
memory40: | 0 MB | 0 MB |
----------+-----------------------+
memory39: | 8192 MB | 8192 MB |
memory38: | 0 MB | 0 MB |
----------+-----------------------+
memory37: | 0 MB | 0 MB |
memory36: | 8192 MB | 8192 MB |
----------+-----------------------+
memory35: | 0 MB | 0 MB |
memory34: | 0 MB | 0 MB |
----------+-----------------------+
memory33: | 8192 MB | 8192 MB |
memory32: | 0 MB | 0 MB |
----------+-----------------------+
memory31: | 0 MB | 0 MB |
memory30: | 8192 MB | 8192 MB |
----------+-----------------------+
memory29: | 0 MB | 0 MB |
memory28: | 0 MB | 0 MB |
----------+-----------------------+
memory27: | 8192 MB | 8192 MB |
memory26: | 0 MB | 0 MB |
----------+-----------------------+
memory25: | 0 MB | 0 MB |
memory24: | 8192 MB | 8192 MB |
----------+-----------------------+
memory23: | 0 MB | 0 MB |
memory22: | 0 MB | 0 MB |
----------+-----------------------+
memory21: | 8192 MB | 8192 MB |
memory20: | 0 MB | 0 MB |
----------+-----------------------+
memory19: | 0 MB | 0 MB |
memory18: | 8192 MB | 8192 MB |
----------+-----------------------+
memory17: | 0 MB | 0 MB |
memory16: | 0 MB | 0 MB |
----------+-----------------------+
memory15: | 8192 MB | 8192 MB |
memory14: | 0 MB | 0 MB |
----------+-----------------------+
memory13: | 0 MB | 0 MB |
memory12: | 8192 MB | 8192 MB |
----------+-----------------------+
memory11: | 0 MB | 0 MB |
memory10: | 0 MB | 0 MB |
----------+-----------------------+
memory9: | 8192 MB | 8192 MB |
memory8: | 0 MB | 0 MB |
----------+-----------------------+
memory7: | 0 MB | 0 MB |
memory6: | 8192 MB | 8192 MB |
----------+-----------------------+
memory5: | 0 MB | 0 MB |
memory4: | 0 MB | 0 MB |
----------+-----------------------+
memory3: | 8192 MB | 8192 MB |
memory2: | 0 MB | 0 MB |
----------+-----------------------+
memory1: | 0 MB | 0 MB |
memory0: | 8192 MB | 8192 MB |
----------+-----------------------+
Total sum of 256 GB.
As there's no reliable way to credit DIMMS to the right memory
controller, just put everything on memory controller 0 (with should
always exist).
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Instead of just faking a random value for the DIMM data, get
the information that it is available via DMI table.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Now that the EDAC core is capable of just forward the errors via
the userspace API, add a report mechanism for the GHES errors.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Register GHES at EDAC MC core, in order to avoid other
drivers to also handle errors and mangle with error data.
The edac core will warrant that just one driver will be used,
so the first one to register (BIOS first) will be the one that
will be reporting the hardware errors.
For now, the EDAC driver does nothing but to register at the
EDAC core, preventing the hardware-driven mechanism to
interfere with GHES.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core patches from Greg Kroah-Hartman:
"Here is the big driver core merge for 3.9-rc1
There are two major series here, both of which touch lots of drivers
all over the kernel, and will cause you some merge conflicts:
- add a new function called devm_ioremap_resource() to properly be
able to check return values.
- remove CONFIG_EXPERIMENTAL
Other than those patches, there's not much here, some minor fixes and
updates"
Fix up trivial conflicts
* tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
base: memory: fix soft/hard_offline_page permissions
drivercore: Fix ordering between deferred_probe and exiting initcalls
backlight: fix class_find_device() arguments
TTY: mark tty_get_device call with the proper const values
driver-core: constify data for class_find_device()
firmware: Ignore abort check when no user-helper is used
firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
firmware: Make user-mode helper optional
firmware: Refactoring for splitting user-mode helper code
Driver core: treat unregistered bus_types as having no devices
watchdog: Convert to devm_ioremap_resource()
thermal: Convert to devm_ioremap_resource()
spi: Convert to devm_ioremap_resource()
power: Convert to devm_ioremap_resource()
mtd: Convert to devm_ioremap_resource()
mmc: Convert to devm_ioremap_resource()
mfd: Convert to devm_ioremap_resource()
media: Convert to devm_ioremap_resource()
iommu: Convert to devm_ioremap_resource()
drm: Convert to devm_ioremap_resource()
...
|
|
That allows APEI GHES driver to report errors directly, using
the EDAC error report API.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|