Kernel panic or soft lockup or hung in netlink_compare in Red Hat Enterprise Linux 7
Issue
- Kernel panic in
netlink_compare
with backtrace similar to the the following:
crash> bt
PID: 12924 TASK: ffff8801c4348b80 CPU: 12 COMMAND: "crond"
#0 [ffff88070d253928] machine_kexec at ffffffff81051e9b
#1 [ffff88070d253988] crash_kexec at ffffffff810f27a2
#2 [ffff88070d253a58] oops_end at ffffffff8163f448
#3 [ffff88070d253a80] die at ffffffff8101859b
#4 [ffff88070d253ab0] do_general_protection at ffffffff8163ed3e
#5 [ffff88070d253ae0] general_protection at ffffffff8163e5e8
[exception RIP: netlink_compare+11]
RIP: ffffffff8155654b RSP: ffff88070d253b90 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ff04000000000006 RCX: 000000002ce2b4ee
RDX: 0000000000000000 RSI: ffff88070d253be0 RDI: ff03fffffffffb7e
RBP: ffff88070d253bc8 R8: ffff88070d253bdc R9: 752f223d65786520
R10: 2f6e6962732f7273 R11: 682022646e6f7263 R12: ffff8808140d2678
R13: ffff88070d253be0 R14: ffffffff81556540 R15: ffff8808091a6c00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff88070d253b98] rhashtable_lookup_compare at ffffffff813086f0
#7 [ffff88070d253bd0] netlink_lookup at ffffffff81556e7e
#8 [ffff88070d253c00] netlink_getsockbyportid at ffffffff8155821f
#9 [ffff88070d253c18] netlink_unicast at ffffffff8155a479
#10 [ffff88070d253c60] netlink_sendmsg at ffffffff8155a8b0
#11 [ffff88070d253cf8] sock_sendmsg at ffffffff815117a0
#12 [ffff88070d253e58] SYSC_sendto at ffffffff81511d11
#13 [ffff88070d253f70] sys_sendto at ffffffff8151279e
#14 [ffff88070d253f80] system_call_fastpath at ffffffff81646b49
kernel: INFO: task crond:96675 blocked for more than 120 seconds.
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: crond D ffffffff81a2ad08 0 96675 3155 0x00000000
kernel: ffff8820283f3b30 0000000000000082 ffff88354e061700 ffff8820283f3fd8
kernel: ffff8820283f3fd8 ffff8820283f3fd8 ffff88354e061700 ffffffff81a2ad00
kernel: ffffffff81a2ad04 ffff88354e061700 00000000ffffffff ffffffff81a2ad08
kernel: Call Trace:
kernel: [<ffffffff8163d309>] schedule_preempt_disabled+0x29/0x70
kernel: [<ffffffff8163b005>] __mutex_lock_slowpath+0xc5/0x1c0
kernel: [<ffffffff8163a46f>] mutex_lock+0x1f/0x2f
kernel: [<ffffffff815574ae>] netlink_insert+0x4e/0x100
kernel: [<ffffffff81308470>] ? rhashtable_lookup_compare+0x30/0x90
kernel: [<ffffffff81557b30>] netlink_autobind.isra.37+0xc0/0x100
kernel: [<ffffffff8155a59a>] netlink_sendmsg+0x22a/0x770
kernel: [<ffffffff81189bba>] ? __dec_zone_page_state+0x2a/0x30
kernel: [<ffffffff815116a0>] sock_sendmsg+0xb0/0xf0
kernel: [<ffffffff81511c11>] SYSC_sendto+0x121/0x1c0
kernel: [<ffffffff8164271d>] ? __do_page_fault+0x16d/0x450
kernel: [<ffffffff81642a23>] ? do_page_fault+0x23/0x80
kernel: [<ffffffff811f1b10>] ? SyS_fcntl+0x4d0/0x5d0
kernel: [<ffffffff8151269e>] SyS_sendto+0xe/0x10
kernel: [<ffffffff81647209>] system_call_fastpath+0x16/0x1b
crash> bt
PID: 29345 TASK: ffff884081ac0000 CPU: 14 COMMAND: "crond"
#0 [ffff88097cb0f958] machine_kexec at ffffffff81051e9b
#1 [ffff88097cb0f9b8] crash_kexec at ffffffff810f27e2
#2 [ffff88097cb0fa88] oops_end at ffffffff8163f208
#3 [ffff88097cb0fab0] die at ffffffff8101859b
#4 [ffff88097cb0fae0] do_general_protection at ffffffff8163eafe
#5 [ffff88097cb0fb10] general_protection at ffffffff8163e3a8
[exception RIP: netlink_compare+11]
RIP: ffffffff81555f0b RSP: ffff88097cb0fbc8 RFLAGS: 00010246
RAX: 0000000000000000 RBX: 0033003600300038 RCX: 00000000c9851744
RDX: 00000000000072a1 RSI: ffff88097cb0fc18 RDI: 00330036002ffbb0
RBP: ffff88097cb0fc00 R8: ffff88097cb0fc14 R9: 000000000000000c
R10: 0000000000000000 R11: 0000000000000246 R12: ffff882278212678
R13: ffff88097cb0fc18 R14: ffffffff81555f00 R15: ffff882598f68fc0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff88097cb0fbd0] rhashtable_lookup_compare at ffffffff813080a0
#7 [ffff88097cb0fc08] netlink_autobind at ffffffff815576ee
#8 [ffff88097cb0fc60] netlink_sendmsg at ffffffff8155a16a
#9 [ffff88097cb0fcf8] sock_sendmsg at ffffffff815112a0
#10 [ffff88097cb0fe58] SYSC_sendto at ffffffff81511811
#11 [ffff88097cb0ff70] sys_sendto at ffffffff8151229e
#12 [ffff88097cb0ff80] system_call_fastpath at ffffffff81646909
- Another example of Kernel panic in
.netlink_compare+
with backtrace similar to the the following in ppc64 :
crash> bt
PID: 58490 TASK: c000000fa85dc1f0 CPU: 22 COMMAND: "httpd"
#0 [c000000c327cae90] .crash_kexec at c000000000183db0
#1 [c000000c327cb090] .die at c000000000020888
#2 [c000000c327cb140] .bad_page_fault at c0000000000567b8
#3 [c000000c327cb1c0] handle_page_fault at c000000000009588
Data Access [300] exception frame:
R0: c0000000007efe4c R1: c000000c327cb4b0 R2: c0000000013823e8
R3: 00000198da5f7380 R4: c000000c327cb5c0 R5: 00000000000000ac
R6: 000000007f612171 R7: 00000000e6b1d61c R8: 000000004637f299
R9: c0000000007eac90 R10: 0000000000000000 R11: 0000000000000000
R12: 0000000028042828 R13: c000000007b3c600 R14: 0000000000000000
R15: 00003fff8cbfe1e0 R16: 00003fff8cbfe1d8 R17: 00003fff4c01e35e
R18: 00003fff4c01e468 R19: 0000000000000000 R20: 000001003575c480
R21: 0000000000000000 R22: 0000000000000001 R23: 7fffffffffffffff
R24: fffffffffffff000 R25: c00000000136ea40 R26: c00000000130fe00
R27: c000000fe3fca8e0 R28: c000000c327cb5c0 R29: c000000001370a40
R30: c000000fed184000 R31: 00000198da5f7800
NIP: c0000000007eac90 MSR: 8000000000009032 OR3: c0000000004affd8
CTR: c0000000007eac90 LR: c0000000004affdc XER: 0000000000000000
CCR: 0000000028042824 MQ: 0000000000000001 DAR: 00000198da5f7678
DSISR: 0000000040000000 Syscall Result: 0000000000000000
#4 [c000000c327cb4b0] .netlink_compare at c0000000007eac90
[Link Register] [c000000c327cb4b0] .rhashtable_lookup_compare at c0000000004affdc
#5 [c000000c327cb550] .__netlink_dump_start at c0000000007efe4c
#6 [c000000c327cb610] .rtnetlink_rcv_msg at c0000000007c46c8
#7 [c000000c327cb6f0] .netlink_rcv_skb at c0000000007f094c
#8 [c000000c327cb780] .rtnetlink_rcv at c0000000007c4514
#9 [c000000c327cb800] .netlink_unicast at c0000000007f050c
#10 [c000000c327cb900] .netlink_sendmsg at c0000000007f19e0
#11 [c000000c327cba40] .sock_sendmsg at c00000000077b2d0
#12 [c000000c327cbc00] .sys_sendto at c00000000078143c
#13 [c000000c327cbd80] .sys_socketcall at c000000000782a18
#14 [c000000c327cbe30] system_call at c00000000000a17c
System Call [c00] exception frame:
R0: 0000000000000066 R1: 00003fff8cbed750 R2: 00003fffa12e74f8
R3: 000000000000000b R4: 00003fff8cbed800 R5: 0000000000000014
R6: 0000000000000000 R7: 0000000000000002 R8: 0000000000000000
R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000000000 R13: 00003fff8cc06900 R14: 0000000000000000
R15: 00003fff8cbfe1e0 R16: 00003fff8cbfe1d8 R17: 00003fff4c01e35e
R18: 00003fff4c01e468 R19: 0000000000000000 R20: 000001003575c480
R21: 00003fff8cbfdf91 R22: 0000000000000020 R23: 00003fffa0fc6cd8
R24: 00003fff8cbed840 R25: 00003fff8cbfd8c0 R26: 00003fff8cbfdbc0
R27: 000000000000e42f R28: 00003fff8cbff910 R29: 00003fff8cbfd890
R30: 000000000000000f R31: 00003fff8cbfd7e0
NIP: 00003fffa122a850 MSR: 800000010000d032 OR3: 000000000000000b
CTR: 0000000000000000 LR: 00003fffa122a83c XER: 0000000000000000
CCR: 0000000044042848 MQ: 0000000000000001 DAR: 00003fff0c0e4c18
DSISR: 0000000042000000 Syscall Result: 0000000000000000
- We upgraded to the 3.10.0-327.22.2.el7 (rhel7.2.z) kernel, and following messages appeared in /var/log/messages:
kernel: audit: netlink_unicast sending to audit_pid=1234 returned error: -111
kernel: audit: audit_lost=1 audit_rate_limit=0 audit_backlog_limit=320
kernel: audit: audit_pid=1234 reset
After that, the audit log began to be written to messages file instead of audit.log file. strace of the auditd process shows the process was living and doing epoll_wait(). So the root cause seems to be in the netlink connection.
Environment
- Red Hat Enterprise Linux 7.2
- Any kernel earlier than
kernel-3.10.0-327.54.1.el7
- Any kernel earlier than
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.