EDAC Error "kernel: EDAC MC1: 0 CE memory read error on CPU_SrcID#0_MC#1_Chan#2_DIMM#0"

Solution In Progress - Updated -

Issue

Feb  8 08:45:20 abcxyz kernel: {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
Feb  8 08:45:20 abcxyz kernel: {1}[Hardware Error]: It has been corrected by h/w and requires no further action
Feb  8 08:45:20 abcxyz kernel: {1}[Hardware Error]: event severity: corrected
Feb  8 08:45:20 abcxyz kernel: {1}[Hardware Error]:  Error 0, type: corrected
Feb  8 08:45:20 abcxyz kernel: {1}[Hardware Error]:  fru_text: A6
Feb  8 08:45:20 abcxyz kernel: {1}[Hardware Error]:   section_type: memory error
Feb  8 08:45:20 abcxyz kernel: {1}[Hardware Error]:   error_status: 0x0000000000000400
Feb  8 08:45:20 abcxyz kernel: {1}[Hardware Error]:   physical_address: 0x0000002870e08b40
Feb  8 08:45:20 abcxyz kernel: {1}[Hardware Error]:   node: 1 card: 2 module: 0 rank: 1 bank: 3 device: 0 row: 41696 column: 592 
Feb  8 08:45:20 abcxyz kernel: {1}[Hardware Error]:   error_type: 2, single-bit ECC
Feb  8 08:45:20 abcxyz kernel: {1}[Hardware Error]:   DIMM location: not present. DMI handle: 0x0000 
Feb  8 08:45:20 abcxyz kernel: {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 65534
Feb  8 08:45:20 abcxyz kernel: {2}[Hardware Error]: It has been corrected by h/w and requires no further action
Feb  8 08:45:20 abcxyz kernel: {2}[Hardware Error]: event severity: corrected
Feb  8 08:45:20 abcxyz kernel: {2}[Hardware Error]:  Error 0, type: corrected
Feb  8 08:45:20 abcxyz kernel: {2}[Hardware Error]:   section type: unknown, 330f1140-72a5-11df-9690-0002a5d5c51b
Feb  8 08:45:20 abcxyz kernel: EDAC MC1: 0 CE memory read error on CPU_SrcID#0_MC#1_Chan#2_DIMM#0 (channel:2 slot:0 page:0x2870e08 offset:0xb40 grain:32 syndrome:0x0 -  err_code:0x0000:0x009f socket:0 imc:1 rank:0 bg:1 ba:2 row:0x1a970 col:0x170)
Feb  8 08:45:20 abcxyz kernel: soft offline: 0x2870e08: migration failed 1, type 2fffff00008000
Feb  8 08:45:21 abcxyz kernel: MCE: Killing SomeAppThread:159778 due to hardware memory corruption fault at 2aab464098c8
Feb  8 08:46:49 abcxyz kernel: mce: [Hardware Error]: Machine check events logged
Feb  8 08:53:31 abcxyz kernel: SomeAppThread[189423]: segfault at 2e0 ip 00007f941f2e3990 sp 00007f938e55ec78 error 4 in some_shared_library.so[7f941f1c7000+352000]

Environment

  • Red Hat Enterprise Linux 7.9
    • kernel 3.10.0-1160.76.1.el7.x86_64
  • Dell PowerEdge series
    • PowerEdge R740

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content