"HARDWARE ERROR. This is *NOT* a software problem!" メッセージは何を示していますか?
Issue
- /var/log/messages に以下のメッセージが記載されます。
kernel:Machine check events logged
mcelog:MCE 0
mcelog:HARDWARE ERROR.This is *NOT* a software problem!
mcelog:Please contact your hardware vendor
mcelog:Unknown Intel CPU type family 6 model 2c
mcelog:CPU 0 BANK 8 TSC a66b05434fcf4 [at 2668 Mhz 12 days 16:48:42 uptime (unreliable)]
mcelog:MISC 5522140800080282 ADDR 4f83b8dc0
mcelog:MCG status:
mcelog:MCi status:
mcelog:MCi_MISC register valid
mcelog:MCi_ADDR register valid
mcelog:MCA:MEMORY CONTROLLER RD_CHANNELunspecified_ERR
mcelog:Transaction:Memory read error
mcelog:STATUS 8c0000400001009f MCGSTATUS 0
kernel:BUG: soft lockup - CPU#10 stuck for 10s![mcelog:6356]
- 同様のエラー:
Hardware event.This is not a software error.
Corrected error
Transaction:Memory scrubbing error
Memory ECC error occurred during scrub
Memory corrected error count (CORE_ERR_CNT):1
Memory DIMM ID of error:1
Memory channel ID of error:2
Hardware event.This is not a software error.
/var/log/messages に、以下のようなコールトレースのエラーメッセージが記載される場合もあります。
Jan 8 08:30:27 Hostname kernel:Pid:30350, comm: rgmanager Tainted:G W --------------- 2.6.32-358.el6.x86_64 #1 Dell Inc. PowerEdge R910/0NCWG9
Jan 8 08:30:27 Hostname kernel:RIP:0010:[<ffffffff8150ffce>] [<ffffffff8150ffce>] _spin_lock+0x1e/0x30
Jan 8 08:30:27 Hostname kernel:RSP:0018:ffff8820c05cdd10 EFLAGS:00000283
Jan 8 08:30:27 Hostname kernel:RAX:0000000000003964 RBX: ffff8820c05cdd10 RCX:0000000000000000
Jan 8 08:30:27 Hostname kernel:RDX:000000000000395f RSI:000000000000001b RDI: ffffffff81e227e8
Jan 8 08:30:27 Hostname kernel:RBP: ffffffff8100bb8e R08:0000000000000000 R09:0000000000000000
Jan 8 08:30:27 Hostname kernel:R10:0000000000000000 R11:0000000000000000 R12: ffff8810685602d8
Jan 8 08:30:27 Hostname kernel:R13:0000000000000000 R14: ffff883080010e40 R15:0000000000000000
Jan 8 08:30:27 Hostname kernel:FS:00007f3e81a20700(0000) GS:ffff8830b8880000(0000) knlGS:0000000000000000
Jan 8 08:30:27 Hostname kernel:CS:0010 DS:0000 ES:0000 CR0:000000008005003b
Jan 8 08:30:27 Hostname kernel:CR2:00000000027477b0 CR3:00000010671a6000 CR4:00000000000007e0
Jan 8 08:30:27 Hostname kernel:DR0:0000000000000000 DR1:0000000000000000 DR2:0000000000000000
Jan 8 08:30:27 Hostname kernel:DR3:0000000000000000 DR6:00000000ffff0ff0 DR7:0000000000000400
Jan 8 08:30:27 Hostname kernel:Process rgmanager (pid:30350, threadinfo ffff8820c05cc000, task ffff8820be965540)
Jan 8 08:30:27 Hostname kernel:Stack:
Jan 8 08:30:27 Hostname kernel: ffff8820c05cdd40 ffffffff8104b8d0 ffff8820c05cdd60 ffff884068122400
Jan 8 08:30:27 Hostname kernel:<d> ffff883a408fc040 ffff883a408fc040 ffff8820c05cdd60 ffffffff8106b179
Jan 8 08:30:27 Hostname kernel:<d> ffff884068122400 ffff881066d31440 ffff8820c05cdde0 ffffffff8106b879
Jan 8 08:30:27 Hostname kernel:Call Trace:
Jan 8 08:30:27 Hostname kernel:[<ffffffff8104b8d0>] ? pgd_alloc+0x50/0x130
Jan 8 08:30:27 Hostname kernel:[<ffffffff8106b179>] ? mm_init+0x139/0x180
Jan 8 08:30:27 Hostname kernel:[<ffffffff8106b879>] ? dup_mm+0xa9/0x520
Jan 8 08:30:27 Hostname kernel:[<ffffffff81061d03>] ? sched_autogroup_fork+0x63/0xa0
Jan 8 08:30:27 Hostname kernel:[<ffffffff8106cb6f>] ? copy_process+0xd5f/0x1450
Jan 8 08:30:27 Hostname kernel:[<ffffffff8106d2f4>] ? do_fork+0x94/0x460
Jan 8 08:30:27 Hostname kernel:[<ffffffff8109bfb4>] ? hrtimer_nanosleep+0xc4/0x180
Jan 8 08:30:27 Hostname kernel:[<ffffffff8109ae00>] ? hrtimer_wakeup+0x0/0x30
Jan 8 08:30:27 Hostname kernel:[<ffffffff81009598>] ? sys_clone+0x28/0x30
Jan 8 08:30:27 Hostname kernel:[<ffffffff8100b393>] ? stub_clone+0x13/0x20
Jan 8 08:30:27 Hostname kernel:[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Jan 8 08:30:27 Hostname kernel:Code:00 00 00 01 74 05 e8 b2 33 d7 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> b7 17 eb f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89
Jan 8 08:30:27 Hostname kernel:Call Trace:
Jan 8 08:30:27 Hostname kernel:[<ffffffff8104b8d0>] ? pgd_alloc+0x50/0x130
Jan 8 08:30:27 Hostname kernel:[<ffffffff8106b179>] ? mm_init+0x139/0x180
Jan 8 08:30:27 Hostname kernel:[<ffffffff8106b879>] ? dup_mm+0xa9/0x520
Jan 8 08:30:27 Hostname kernel:[<ffffffff81061d03>] ? sched_autogroup_fork+0x63/0xa0
Jan 8 08:30:27 Hostname kernel:[<ffffffff8106cb6f>] ? copy_process+0xd5f/0x1450
Jan 8 08:30:27 Hostname kernel:[<ffffffff8106d2f4>] ? do_fork+0x94/0x460
Jan 8 08:30:27 Hostname kernel:[<ffffffff8109bfb4>] ? hrtimer_nanosleep+0xc4/0x180
Jan 8 08:30:27 Hostname kernel:[<ffffffff8109ae00>] ? hrtimer_wakeup+0x0/0x30
Jan 8 08:30:27 Hostname kernel:[<ffffffff81009598>] ? sys_clone+0x28/0x30
Jan 8 08:30:27 Hostname kernel:[<ffffffff8100b393>] ? stub_clone+0x13/0x20
Jan 8 08:30:27 Hostname kernel:[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Jan 8 08:30:39 Hostname kernel:BUG: soft lockup - CPU#3 stuck for 67s![sshd:4711]
......
また、以下のようなエラーが /var/mcelog に記載される場合もあります。
MCE 0
CPU 2 BANK 9
TIME 1388666356 Thu Jan 2 20:39:16 2014
MCG status:
MCi status:
Uncorrected error
Error enabled
MCA:MEMORY CONTROLLER RD_CHANNELunspecified_ERR
Transaction:Memory read error
STATUS b00000000800009f MCGSTATUS 0
MCGCAP 1000c18 APICID 80 SOCKETID 2
CPUID Vendor Intel Family 6 Model 47
Hardware event.This is not a software error.
.....
Environment
- Red Hat Enterprise Linux 5
- Red Hat Enterprise Linux 6
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.