EDAC Error "kernel: EDAC MC1: 0 CE memory read error on CPU_SrcID#0_MC#1_Chan#2_DIMM#0"
Issue
Feb 8 08:45:20 abcxyz kernel: {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
Feb 8 08:45:20 abcxyz kernel: {1}[Hardware Error]: It has been corrected by h/w and requires no further action
Feb 8 08:45:20 abcxyz kernel: {1}[Hardware Error]: event severity: corrected
Feb 8 08:45:20 abcxyz kernel: {1}[Hardware Error]: Error 0, type: corrected
Feb 8 08:45:20 abcxyz kernel: {1}[Hardware Error]: fru_text: A6
Feb 8 08:45:20 abcxyz kernel: {1}[Hardware Error]: section_type: memory error
Feb 8 08:45:20 abcxyz kernel: {1}[Hardware Error]: error_status: 0x0000000000000400
Feb 8 08:45:20 abcxyz kernel: {1}[Hardware Error]: physical_address: 0x0000002870e08b40
Feb 8 08:45:20 abcxyz kernel: {1}[Hardware Error]: node: 1 card: 2 module: 0 rank: 1 bank: 3 device: 0 row: 41696 column: 592
Feb 8 08:45:20 abcxyz kernel: {1}[Hardware Error]: error_type: 2, single-bit ECC
Feb 8 08:45:20 abcxyz kernel: {1}[Hardware Error]: DIMM location: not present. DMI handle: 0x0000
Feb 8 08:45:20 abcxyz kernel: {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 65534
Feb 8 08:45:20 abcxyz kernel: {2}[Hardware Error]: It has been corrected by h/w and requires no further action
Feb 8 08:45:20 abcxyz kernel: {2}[Hardware Error]: event severity: corrected
Feb 8 08:45:20 abcxyz kernel: {2}[Hardware Error]: Error 0, type: corrected
Feb 8 08:45:20 abcxyz kernel: {2}[Hardware Error]: section type: unknown, 330f1140-72a5-11df-9690-0002a5d5c51b
Feb 8 08:45:20 abcxyz kernel: EDAC MC1: 0 CE memory read error on CPU_SrcID#0_MC#1_Chan#2_DIMM#0 (channel:2 slot:0 page:0x2870e08 offset:0xb40 grain:32 syndrome:0x0 - err_code:0x0000:0x009f socket:0 imc:1 rank:0 bg:1 ba:2 row:0x1a970 col:0x170)
Feb 8 08:45:20 abcxyz kernel: soft offline: 0x2870e08: migration failed 1, type 2fffff00008000
Feb 8 08:45:21 abcxyz kernel: MCE: Killing SomeAppThread:159778 due to hardware memory corruption fault at 2aab464098c8
Feb 8 08:46:49 abcxyz kernel: mce: [Hardware Error]: Machine check events logged
Feb 8 08:53:31 abcxyz kernel: SomeAppThread[189423]: segfault at 2e0 ip 00007f941f2e3990 sp 00007f938e55ec78 error 4 in some_shared_library.so[7f941f1c7000+352000]
Environment
- Red Hat Enterprise Linux 7.9
- kernel 3.10.0-1160.76.1.el7.x86_64
- Dell PowerEdge series
- PowerEdge R740
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.