Commits · 53f2d02898755d1b24bde1975e202815d29fdb81 · Kirill Smelkov / linux

11 Jun, 2012 1 commit

RAS: Add a tracepoint for reporting memory controller events · 53f2d028

Mauro Carvalho Chehab authored Feb 23, 2012

Add a new tracepoint-based hardware events report method for
reporting Memory Controller events.

Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.

[1] http://lwn.net/Articles/416669/

    We have several subsystems & methods for reporting hardware errors:

    1) EDAC ("Error Detection and Correction").  In its original form
    this consisted of a platform specific driver that read topology
    information and error counts from chipset registers and reported
    the results via a sysfs interface.

    2) mcelog - x86 specific decoding of machine check bank registers
    reporting in binary form via /dev/mcelog. Recent additions make use
    of the APEI extensions that were documented in version 4.0a of the
    ACPI specification to acquire more information about errors without
    having to rely reading chipset registers directly. A user level
    programs decodes into somewhat human readable format.

    3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
    decodes errors reported via machine check bank registers in AMD
    processors to the console log using printk();

    Each of these mechanisms has a band of followers ... and none
    of them appear to meet all the needs of all users.

As part of a RAS subsystem, let's encapsulate the memory error hardware
events into a trace facility.

The tracepoint printk will be displayed like:

mc_event: [quant] (Corrected|Uncorrected|Fatal) error:[error msg] on [label] ([location] [edac_mc detail] [driver_detail]

Where:
       	[quant] is the quantity of errors
	[error msg] is the driver-specific error message
		    (e. g. "memory read", "bus error", ...);
	[location] is the location in terms of memory controller and
		   branch/channel/slot, channel/slot or csrow/channel;
	[label] is the memory stick label;
	[edac_mc detail] describes the address location of the error
			 and the syndrome;
	[driver detail] is driver-specifig error message details,
			when needed/provided (e. g. "area:DMA", ...)

For example:

mc_event: 1 Corrected error:memory read on memory stick DIMM_1A (mc:0 location:0:0:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)

Of course, any userspace tools meant to handle errors should not parse
the above data. They should, instead, use the binary fields provided by
the tracepoint, mapping them directly into their Management Information
Base.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
However, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

53f2d028

28 May, 2012 39 commits

i7core: fix ranks information at the per-channel struct · 0bf09e82

Mauro Carvalho Chehab authored Apr 26, 2012

There is a flag at the per-channel struct that indicates if there are
any 4R dimm on it. The way the presence of this flag were reported
is not ok, as it might give the false idea that the channel were filled
with 2R memories:

[  580.588701] EDAC DEBUG: get_dimm_config: Ch1 phy rd1, wr1 (0x063f7431): 2 ranks, UDIMMs
[  580.588704] EDAC DEBUG: get_dimm_config: 	dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400

(in this case, just one 1R memory is filled on channel 1)

So, use a better way to represent the per-channel ranks information.
After the patch, it will show:

[ 2002.233978] EDAC DEBUG: get_dimm_config: Ch0 phy rd0, wr0 (0x063f7431): UDIMMs
[ 2002.233982] EDAC DEBUG: get_dimm_config: 	dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400
[ 2002.233988] EDAC DEBUG: get_dimm_config: 	dimm 1 1024 Mb offset: 4, bank: 8, rank: 1, row: 0x4000, col: 0x400

(in this case, there isn't any 4R memories)
Reported-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

0bf09e82

i5000: Fix the fatal error handling · 486dfb16

Mauro Carvalho Chehab authored Apr 25, 2012

The fatal error channel bits point to a single channel, and not
to a range of channels. Fix the code to properly report it,
instead of printing messages like:
	kernel: EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

486dfb16

i5100_edac: Fix a warning when compiled with 32 bits · 9f70d08a

Mauro Carvalho Chehab authored Mar 29, 2012

drivers/edac/i5100_edac.c: In function ‘i5100_init_csrows’:
drivers/edac/i5100_edac.c:862:3: warning: format ‘%zd’ expects argument of type ‘signed size_t’, but argument 5 has type ‘long unsigned int’ [-Wformat]
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

9f70d08a

i82975x_edac: Test nr_pages earlier to save a few CPU cycles · 36683aab

Mauro Carvalho Chehab authored Mar 28, 2012

Avoid test nr_pages twice, and initializing some data that won't
be used.

Cleanup patch only.
Reported-by: Aristeu Rozanski Filho <arozansk@redhat.com>
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

36683aab

e752x_edac: provide more info about how DIMMS/ranks are mapped · 805afb69

Mauro Carvalho Chehab authored Mar 15, 2012

No funtional changes here. Only the comments got updated.
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

805afb69

i5000_edac: Fix the logic that retrieves memory information · 64e1fdaf

Mauro Carvalho Chehab authored Feb 24, 2012

The logic there is broken: it basically creates two csrows for
each DIMM and assumes that all DIMM's are dual rank. Only one of
the csrows will contain the entire DIMM size. If single rank
memories are found, they'll be marked with 0 bytes.

The check if the AMB is present were also wrong.

Yet, as the error reports don't use the memory size in order to
credit an error to the right DIMM, that part of the driver seems
to work. That's why probably nobody detected the issue yet.

After this patch, the memory layout is now properly reported,
when debug mode is enabled, and the number of ranks per dimm is
now shown:

calculate_dimm_size: ----------------------------------------------------------
calculate_dimm_size: slot 3 0 MB | 0 MB | 0 MB | 0 MB |
calculate_dimm_size: slot 2 0 MB | 0 MB | 0 MB | 0 MB |
calculate_dimm_size: ----------------------------------------------------------
calculate_dimm_size: slot 1 0 MB | 0 MB | 0 MB | 0 MB |
calculate_dimm_size: slot 0 512 MB 1R| 512 MB 1R| 512 MB 1R| 512 MB 1R|
calculate_dimm_size: ----------------------------------------------------------
calculate_dimm_size: channel 0 | channel 1 | channel 2 | channel 3 |
calculate_dimm_size: branch 0 | branch 1 |

(1R above means that all memories on my test machine are single-ranked)
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

64e1fdaf

i5400_edac: improve debug messages to better represent the filled memory · 68d086f8

Mauro Carvalho Chehab authored Feb 12, 2012

Improves the debug output message, in order to better represent the
memory controller hierarchy, when outputing the debug messages.

No functional changes when debug is disabled.
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

68d086f8

edac: Cleanup the logs for i7core and sb edac drivers · e17a2f42

Mauro Carvalho Chehab authored May 11, 2012

Remove some information that it is duplicated at the MCE log,
and don't have much usage for the error. Those data will be
added again, when creating a trace function that outputs both
memory errors and MCE fields.

Cc: Aristeu Rozanski <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

e17a2f42

edac: Initialize the dimm label with the known information · 5926ff50

Mauro Carvalho Chehab authored Feb 09, 2012

While userspace doesn't fill the dimm labels, add there the dimm location,
as described by the used memory model. This could eventually match what
is described at the dmidecode, making easier for people to identify the
memory.

For example, on an Intel motherboard where the DMI table is reliable,
the first memory stick is described as:

Memory Device
	Array Handle: 0x0029
	Error Information Handle: Not Provided
	Total Width: 64 bits
	Data Width: 64 bits
	Size: 2048 MB
	Form Factor: DIMM
	Set: 1
	Locator: A1_DIMM0
	Bank Locator: A1_Node0_Channel0_Dimm0
	Type: <OUT OF SPEC>
	Type Detail: Synchronous
	Speed: 800 MHz
	Manufacturer: A1_Manufacturer0
	Serial Number: A1_SerNum0
	Asset Tag: A1_AssetTagNum0
	Part Number: A1_PartNum0

The memory named as "A1_DIMM0" is physically located at the first
memory controller (node 0), at channel 0, dimm slot 0.

After this patch, the memory label will be filled with:
	/sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0

And (after the new EDAC API patches) as:
	/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0

So, even if the memory label is not initialized on userspace, an useful
information with the error location is filled there, expecially since
several systems/motherboards are provided with enough info to map from
channel/slot (or branch/channel/slot) into the DIMM label. So, letting the
EDAC core fill it by default is a good thing.

It should noticed that, as the label filling happens at the
edac_mc_alloc(), drivers can override it to better describe the memories
(and some actually do it).

Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

5926ff50

edac: Remove the legacy EDAC ABI · ca0907b9

Mauro Carvalho Chehab authored May 02, 2012

Now that all drivers got converted to use the new ABI, we can
drop the old one.
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

ca0907b9

x38_edac: convert driver to use the new edac ABI · e2acc357

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

e2acc357

tile_edac: convert driver to use the new edac ABI · 40467db7

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

40467db7

sb_edac: convert driver to use the new edac ABI · c36e3e77

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

c36e3e77

r82600_edac: convert driver to use the new edac ABI · 63b5d1d9

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Tim Small <tim@buttersideup.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

63b5d1d9

ppc4xx_edac: convert driver to use the new edac ABI · 94d93374

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

94d93374

pasemi_edac: convert driver to use the new edac ABI · f34575ac

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

f34575ac

mv64x60_edac: convert driver to use the new edac ABI · a583ac6c

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

a583ac6c

mpc85xx_edac: convert driver to use the new edac ABI · ad4d6e23

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

ad4d6e23

i82975x_edac: convert driver to use the new edac ABI · 70521358

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

70521358

i82875p_edac: convert driver to use the new edac ABI · 0a8a9ac9

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

0a8a9ac9

i82860_edac: convert driver to use the new edac ABI · 84c3a684

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

84c3a684

i82443bxgx_edac: convert driver to use the new edac ABI · 40f562b1

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Tim Small <tim@buttersideup.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

40f562b1

i7core_edac: convert driver to use the new edac ABI · 0975c16f

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

0975c16f

i7300_edac: convert driver to use the new edac ABI · 70e2a837

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

70e2a837

i5400_edac: convert driver to use the new edac ABI · 296da591

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

296da591

i5100_edac: convert driver to use the new edac ABI · d1afaa0a

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

d1afaa0a

i5000_edac: convert driver to use the new edac ABI · 702df640

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

702df640

i3200_edac: convert driver to use the new edac ABI · 95b93287

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

95b93287

i3000_edac: convert driver to use the new edac ABI · 884906f1

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

884906f1

e7xxx_edac: convert driver to use the new edac ABI · 30ac4406

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

30ac4406

e752x_edac: convert driver to use the new edac ABI · ce11ce17

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Mark Gross <mark.gross@intel.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

ce11ce17

cpc925_edac: convert driver to use the new edac ABI · df62b1e6

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

df62b1e6

cell_edac: convert driver to use the new edac ABI · 6458fc08

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

6458fc08

amd76x_edac: convert driver to use the new edac ABI · d8c34af4

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

d8c34af4

amd64_edac: convert driver to use the new edac ABI · ab5a503c

Mauro Carvalho Chehab authored Apr 16, 2012

The legacy edac ABI is going to be removed. Port the driver to use
and benefit from the new API functionality.

Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

ab5a503c

edac: Change internal representation to work with layers · 4275be63

Mauro Carvalho Chehab authored Apr 18, 2012

Change the EDAC internal representation to work with non-csrow
based memory controllers.

There are lots of those memory controllers nowadays, and more
are coming. So, the EDAC internal representation needs to be
changed, in order to work with those memory controllers, while
preserving backward compatibility with the old ones.

The edac core was written with the idea that memory controllers
are able to directly access csrows.

This is not true for FB-DIMM and RAMBUS memory controllers.

Also, some recent advanced memory controllers don't present a per-csrows
view. Instead, they view memories as DIMMs, instead of ranks.

So, change the allocation and error report routines to allow
them to work with all types of architectures.

This will allow the removal of several hacks with FB-DIMM and RAMBUS
memory controllers.

Also, several tests were done on different platforms using different
x86 drivers.

TODO: a multi-rank DIMMs are currently represented by multiple DIMM
entries in struct dimm_info. That means that changing a label for one
rank won't change the same label for the other ranks at the same DIMM.
This bug is present since the beginning of the EDAC, so it is not a big
deal. However, on several drivers, it is possible to fix this issue, but
it should be a per-driver fix, as the csrow => DIMM arrangement may not
be equal for all. So, don't try to fix it here yet.

I tried to make this patch as short as possible, preceding it with
several other patches that simplified the logic here. Yet, as the
internal API changes, all drivers need changes. The changes are
generally bigger in the drivers for FB-DIMMs.

Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

4275be63

edac.h: Add generic layers for describing a memory location · 982216a4

Mauro Carvalho Chehab authored Apr 16, 2012

The edac core were written with the idea that memory controllers
are able to directly access csrows, and that the channels are
used inside a csrows select.

This is not true for FB-DIMM and RAMBUS memory controllers.

Also, some recent advanced memory controllers don't present a per-csrows
view. Instead, they view memories as DIMMs, instead of ranks, accessed
via csrow/channel.

So, changes are needed in order to allow the EDAC core to
work with all types of architectures.

In preparation for handling non-csrows based memory controllers,
add some memory structs and a macro:

enum hw_event_mc_err_type: describes the type of error
			   (corrected, uncorrected, fatal)

To be used by the new edac_mc_handle_error function;

enum edac_mc_layer: describes the type of a given memory
architecture layer (branch, channel, slot, csrow).

struct edac_mc_layer: describes the properties of a memory
		      layer (type, size, and if the layer
		      will be used on a virtual csrow.

EDAC_DIMM_PTR() - as the number of layers can vary from 1 to 3,
this macro converts from an address with up to 3 layers into
a linear address.
Reviewed-by: Borislav Petkov <bp@amd64.org>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

982216a4

edac: rewrite edac_align_ptr() · 93e4fe64

Mauro Carvalho Chehab authored Apr 16, 2012

The edac_align_ptr() function is used to prepare data for a single
memory allocation kzalloc() call. It counts how many bytes are needed
by some data structure.

Using it as-is is not that trivial, as the quantity of memory elements
reserved is not there, but, instead, it is on a next call.

In order to avoid mistakes when using it, move the number of allocated
elements into it, making easier to use it.
Reviewed-by: Borislav Petkov <bp@amd64.org>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

93e4fe64

edac: move nr_pages to dimm struct · a895bf8b

Mauro Carvalho Chehab authored Jan 28, 2012

The number of pages is a dimm property. Move it to the dimm struct.

After this change, it is possible to add sysfs nodes for the DIMM's that
will properly represent the DIMM stick properties, including its size.

A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when
the memory controller represents the memory via chip select rows.
Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

a895bf8b