The flag is_registered is not initialized until mci_bind_devs()
is called. Refer it properly.
The mci->dev and mci->edac_check is required in edac_mc_add_mc(),
so prepare them just before the call.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
We already do 'get' for all sockets at once. So do 'put' in the
same way.
And let args of the 'get' function to void since it handles
only the single, static and known size table pci_dev_table[].
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Have a couple of method.
while here sort out lines in the i7core_register_mci() a bit.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Have a method to make a couple with alloc_i7core_dev() previously
introduced. Using in pair will help proper resource handling.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
It's nice to have a method for a single purpose.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Since we need to pass the index of the entry, pass the table itself
instead of passing individual members of the table.
While here make it static.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
commit 47251b4d960bdfa648b0d06dbc6d445f41cb3906 have changed
the logic for unexplained reasons. It looks strange that it
can release i7core_dev without calling i7core_put_devices()
that releases i7core_dev->pdev.
Fix the part.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy PCI probe sometimes cause hangs. Better to have it
disabled by default, and have a parameter to enable it.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This is a nasty bug. Since kobject count will be reduced by zero by
edac_mc_del_mc(), and this triggers the kobj release method, the
mci memory will be freed automatically. So, all we have left is ctl_name,
as shown by enabling debug:
[ 80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device() remove_link
[ 80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device() remove_mci_instance
[ 80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[ 80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0
[ 80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs
[ 80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free()
[ 80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj()
[ 80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices()
Also, kfree(mci) shouldn't happen at the kobj.release, as it happens
when edac_remove_sysfs_mci_device() is called, but the logic is:
edac_remove_sysfs_mci_device(mci);
edac_printk(KERN_INFO, EDAC_MC,
"Removed device %d for %s %s: DEV %s\n", mci->mc_idx,
mci->mod_name, mci->ctl_name, edac_dev_name(mci));
So, as the edac_printk() needs the mci struct, this generates an OOPS.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
A very nasty bug were happening on edac core, due to the way mci objects are
freed. mci memory is freed when kobject count reaches zero, by
edac_mci_control_release(). However, from the logs, this is clearly happening
before the final usage of mci struct:
[15799.607454] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[15799.618773] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 769: edac_inst_grp_release()
[15799.627326] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 894: edac_remove_mci_instance_attributes() end of seeking for group all_channel_counts
[15799.640887] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 877: edac_remove_mci_instance_attributes() sysfs_attrib = ffffffffa01d7240
[15799.653412] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device() remove_link
[15799.663753] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device() remove_mci_instance
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
There are two groups of sysfs attributes: one for rdimm and another
for udimm. Instead of changing dynamically the unique static struct
for handling udimm's, declare two vars and make them constant.
This avoids the risk of having two or more memory controllers, each
needing a different set of attributes.
While here, use const on all places where it is applicable.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
edac_core: use const for constant sysfs arguments
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
With multi-sockets, more than one edac pci handler is enabled. Be sure to
un-register all instances.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, amd_nb: Enable GART support for AMD family 0x15 CPUs
x86, amd: Use compute unit information to determine thread siblings
x86, amd: Extract compute unit information for AMD CPUs
x86, amd: Add support for CPUID topology extension of AMD CPUs
x86, nmi: Support NMI watchdog on newer AMD CPU families
x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs
x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB
x86, k8-gart: Decouple handling of garts and northbridges
x86, cacheinfo: Fix dependency of AMD L3 CID
x86, kvm: add new AMD SVM feature bits
x86, cpu: Fix allowed CPUID bits for KVM guests
x86, cpu: Update AMD CPUID feature bits
x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit
x86, AMD: Remove needless CPU family check (for L3 cache info)
x86, tsc: Remove CPU frequency calibration on AMD
Fix
drivers/edac/mce_amd.c:262: warning: left shift count >= width of type
on 32-bit builds.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
F11h has almost the same MCE signatures as K8 except DRAM ECC and MC5
bank errors. Reuse functionality from the other families.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Now that all decoders have been taught about F14h, models < 0x10
MCEs, enable decoding on this family of CPUs. Also, issue a short
informational message upon boot that MCE decoding gets enabled.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
F14h CPUs do not generate LS MCEs so exit early and warn the user in
case this path is ever hit that something else might be going haywire.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add support for IC MCEs for F14h CPUs. K8 and F10h are almost identical
so use one function for both.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add a per-family data cache decoders. Since there is a certain overlap
between the different DC MCE signatures, reuse functionality between the
families as far as possible.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Drop "edac_" string from the filenames since they're prefixed with edac/
in their pathname anyway.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add sysfs injection facilities for testing of the MCE decoding code.
Remove large parts of amd64_edac_dbg.c, as a result, which did only
NB MCE injection anyway and the new injection code supports that
functionality already.
Add an injection module so that MCE decoding code in production kernels
like those in RHEL and SLES can be tested.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Move toplevel sysfs class to the stub and make it available to
non-modularized code too. Add proper refcounting of its users and move
the registration functionality into the reference counting routines.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
... instead of the MCi_STATUS info only for improved handling of certain
types of errors later.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
.. so that the user knows what she's looking at there in dmesg. Also,
fix a minor cosmetic output inconsistency.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
The patch below updates broken web addresses in the kernel
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Ben Pfaff <blp@cs.stanford.edu>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
f4347553b30ec66530bfe63c84530afea3803396 removed the edac polling
mechanism in favor of using a notifier chain for conveying MCE
information to edac. However, the module removal path didn't test
whether the driver had setup the polling function workqueue at all and
the rmmod process was hanging in the kernel at try_to_del_timer_sync()
in the cancel_delayed_work() path, trying to cancel an uninitialized
work struct.
Fix that by adding a balancing check to the workqueue removal path.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Due to the current edac-core limits, we cannot represent a per-channel
memory size, for FB-DIMM drivers. So, we need to sum-up all values
for each slot, in order to properly represent the total amount of
memory found by the i7300 driver.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
It is still somewhat fake, as the pages may not be on this exact order,
and may even be used in mirror mode, but this is a best guess than the
other random fake values.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The file names are somehow misleading as the code is not specific to
AMD K8 CPUs anymore. The files accomodate code for other AMD CPU
northbridges as well.
Same is true for the config option which is valid for AMD CPU
northbridges in general and not specific to K8.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160343.GD4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
So far we only provide num_k8_northbridges. This is required in
different areas (e.g. L3 cache index disable, GART). But not all AMD
CPUs provide a GART. Thus it is useful to split off the GART handling
from the generic caching of AMD northbridge misc devices.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160254.GC4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>