Kernel panic - not syncing: Attempted to kill init!
Red Hat Insights can detect this issue
Environment
- Red Hat Enterprise Linux 6
- SIGBUS (7) signal
Issue
- Server Rebooted Automatically with kernel panic and following call traces;
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.32-504.12.2.el6.x86_64 #1
Call Trace:
[<ffffffff8152933c>] ? panic+0xa7/0x16f
[<ffffffff8132f4f6>] ? get_current_tty+0x66/0x70
[<ffffffff8107a5f2>] ? do_exit+0x862/0x870
[<ffffffff8108c51d>] ? __sigqueue_free+0x3d/0x50
[<ffffffff8107a658>] ? do_group_exit+0x58/0xd0
[<ffffffff81090306>] ? get_signal_to_deliver+0x1f6/0x460
[<ffffffff8100a265>] ? do_signal+0x75/0x800
[<ffffffff8109eefc>] ? remove_wait_queue+0x3c/0x50
[<ffffffff81010000>] ? show_registers+0x220/0x280
[<ffffffff8104c527>] ? is_prefetch+0xb7/0x230
[<ffffffff810796c8>] ? sys_waitid+0xa8/0x1f0
[<ffffffff8100aa80>] ? do_notify_resume+0x90/0xc0
[<ffffffff8100badc>] ? retint_signal+0x48/0x8c
- Missing file ,"/lib64/ld-linux-x86-64.so.2"
Resolution
- Verify the reason behind deletion of swap device while it was in use.
In some cases it is also observed that underlying devices used for swap filesystem are not deleted from system, but are having large number of IO errors, due to which the read IO on swap filesystem fails and results in the same panic as described here. In such case it would be recommended to investigate why the IOs on underlying disk devices were failing.
- Verify the
glibc
package.
# rpm -V glibc
missing /lib64/ld-linux-x86-64.so.2
- Boot the system to rescue mode and Install the
glibc
package.
# rpm -ivh --root=/mnt/sysimage /path/to/glic-<version-release.arch> --replacepkgs --replacefiles
NOTE:- check the glibc|glibc-common
version installed, download and install the package.
Root Cause
- The kernel invoked panic() function because "init" task with PID (1) received a "SIGBUS" (7) signal due to "BUS_ADRERR".
- A "SIGBUS" can be caused by any general device fault that the computer detects, though a bus error rarely means that the computer hardware is physically broken. Bus error may also be raised for certain other paging errors.
- In this example "init" task with PID (1) received a "SIGBUS" (7) signal because of the deletion of swap device while it was in use.
Diagnostic Steps
System Information:
crash> sys | grep -e UPTIME -e RELEASE -e MACHINE -e PANIC
UPTIME: 17 days, 19:45:20
RELEASE: 2.6.32-504.12.2.el6.x86_64
MACHINE: x86_64 (2400 Mhz)
PANIC: "Kernel panic - not syncing: Attempted to kill init!"
crash> rd -a ffffffff8202b000 160
ffffffff8202b000: Phoenix Technologies LTD
ffffffff8202b01c: 6.00
ffffffff8202b024: 2.4
ffffffff8202b02c: 10/22/2013
ffffffff8202b038: VMware, Inc.
ffffffff8202b048: VMware Virtual Platform
ffffffff8202b060: None
ffffffff8202b068: VMware-42 2e 37 a2 37 de c5 c4-b7 d2 17 c0 66 c5 5c 04
Kernel Ring Buffer:
crash> log
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.32-504.12.2.el6.x86_64 #1
Call Trace:
[<ffffffff8152933c>] ? panic+0xa7/0x16f
[<ffffffff8132f4f6>] ? get_current_tty+0x66/0x70
[<ffffffff8107a5f2>] ? do_exit+0x862/0x870
[<ffffffff8108c51d>] ? __sigqueue_free+0x3d/0x50
[<ffffffff8107a658>] ? do_group_exit+0x58/0xd0
[<ffffffff81090306>] ? get_signal_to_deliver+0x1f6/0x460
[<ffffffff8100a265>] ? do_signal+0x75/0x800
[<ffffffff8109eefc>] ? remove_wait_queue+0x3c/0x50
[<ffffffff81010000>] ? show_registers+0x220/0x280
[<ffffffff8104c527>] ? is_prefetch+0xb7/0x230
[<ffffffff810796c8>] ? sys_waitid+0xa8/0x1f0
[<ffffffff8100aa80>] ? do_notify_resume+0x90/0xc0
[<ffffffff8100badc>] ? retint_signal+0x48/0x8c
Backtrace of panic task:
crash> set -p
PID: 1
COMMAND: "init"
TASK: ffff88023ce17500 [THREAD_INFO: ffff88023ce18000]
CPU: 2
STATE: (PANIC)
crash> bt
PID: 1 TASK: ffff88023ce17500 CPU: 2 COMMAND: "init"
#0 [ffff88023ce19b30] machine_kexec at ffffffff8103b5bb
#1 [ffff88023ce19b90] crash_kexec at ffffffff810c9852
#2 [ffff88023ce19c60] panic at ffffffff81529343
#3 [ffff88023ce19ce0] do_exit at ffffffff8107a5f2
#4 [ffff88023ce19d60] do_group_exit at ffffffff8107a658
#5 [ffff88023ce19d90] get_signal_to_deliver at ffffffff81090306
#6 [ffff88023ce19e30] do_signal at ffffffff8100a265
#7 [ffff88023ce19f30] do_notify_resume at ffffffff8100aa80
#8 [ffff88023ce19f50] retint_signal at ffffffff8100badc
RIP: 00007f74c34f0dbb RSP: 00007fff4be7dce0 RFLAGS: 00010206
RAX: 00007f74c5e92230 RBX: 00007f74c5e92230 RCX: 0000000000000042
RDX: 0000000000000000 RSI: 00007f74c5e7ffa0 RDI: 00007f74c5e8f2b8
RBP: 0000000000000003 R8: 000000000000ffff R9: 0000000000000004
R10: 00007f74c34d4110 R11: 00007f74c34d8065 R12: 00007f74c5e7ffa0
R13: 00007f74c5e8f2b8 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
- Determine the value of "signal" and "exit_signal" from "task_struct" structure of panic task.
crash> task_struct.signal,exit_signal ffff88023ce17500
signal = 0xffff88023ce1da00
exit_signal = 0
- Determine the value of "flags" from "signal_struct" structure.
crash> signal_struct.flags 0xffff88023ce1da00
flags = 8
- The value "8" of "flags" field means {SIGNAL_GROUP_EXIT} ( i.e group exit in progress )
crash> eval -b 8
hexadecimal: 8
decimal: 8
octal: 10
binary: 0000000000000000000000000000000000000000000000000000000000001000
bits set: 3
Kernel Source: include/linux/sched.h
/*
* Bits in flags field of signal_struct.
*/
#define SIGNAL_STOP_STOPPED 0x00000001 /* job control stop in effect */
#define SIGNAL_STOP_DEQUEUED 0x00000002 /* stop signal dequeued */
#define SIGNAL_STOP_CONTINUED 0x00000004 /* SIGCONT since WCONTINUED reap */
#define SIGNAL_GROUP_EXIT 0x00000008 /* group exit in progress */
#define SIGNAL_GROUP_COREDUMP 0x00000080 /* coredump in progress */
- Determine the value of "si_signo" and "si_code" from "siginfo_t" structure.
- "si_signo" is a signal number being delivered.
- "si_code" is a reason code, a numerical value (not a bit mask) indicating why this signal was sent.
crash> struct siginfo_t -o
typedef struct siginfo {
[0] int si_signo;
[4] int si_errno;
[8] int si_code;
union {
int _pad[28];
struct {...} _kill;
struct {...} _timer;
struct {...} _rt;
struct {...} _sigchld;
struct {...} _sigfault;
struct {...} _sigpoll;
[16] } _sifields;
} siginfo_t;
SIZE: 128
- "siginfo_t *" is the first argument of get_signal_to_deliver() function.
crash> px get_signal_to_deliver
get_signal_to_deliver = $1 =
{int (siginfo_t *, struct k_sigaction *, struct pt_regs *, void *)} 0xffffffff81090110 <get_signal_to_deliver>
crash> dis -r ffffffff8100a265| tail -n 5
0xffffffff8100a257 <do_signal+103>: mov %rbx,%rdx
0xffffffff8100a25a <do_signal+106>: mov %r13,%rsi
0xffffffff8100a25d <do_signal+109>: mov %r14,%rdi
0xffffffff8100a260 <do_signal+112>: callq 0xffffffff81090110 <get_signal_to_deliver>
0xffffffff8100a265 <do_signal+117>: test %eax,%eax
crash> dis -r ffffffff81090306| head -n 5
0xffffffff81090110 <get_signal_to_deliver>: push %rbp
0xffffffff81090111 <get_signal_to_deliver+1>: mov %rsp,%rbp
0xffffffff81090114 <get_signal_to_deliver+4>: push %r15
0xffffffff81090116 <get_signal_to_deliver+6>: push %r14
0xffffffff81090118 <get_signal_to_deliver+8>: push %r13
crash> bt -f | grep -e get_signal_to_deliver -A 11
#5 [ffff88023ce19d90] get_signal_to_deliver at ffffffff81090306
ffff88023ce19d98: ffff88023ce17500 0000000000000004
ffff88023ce19da8: 0000000000000002 ffff88023ce17b78
ffff88023ce19db8: ffff88023ce19ed8 ffff88023ce19f58
ffff88023ce19dc8: ffff88023ce17500 ffff88023ce17500
ffff88023ce19dd8: ffff88023ce17500 ffff880237803948
ffff88023ce19de8: ffff880237803140 ffff88023ce17c78
ffff88023ce19df8: ffff88023ce19f18 ffff88023ce19f58
ffff88023ce19e08: ffff88023ce19f58 ffff88023ce19ed8
ffff88023ce19e18: ffff88023ce19e58 ffff88023ce17c78
ffff88023ce19e28: ffff88023ce19f28 ffffffff8100a265
#6 [ffff88023ce19e30] do_signal at ffffffff8100a265
- The value of "si_signo" is set to (7).
crash> siginfo_t.si_signo ffff88023ce19e58
si_signo = 7 /* Signal number */
- It means "init" task with PID (1) received a "SIGBUS" (7) signal.
#define SIGBUS 7
- The value of "si_code" is set to (196610).
crash> siginfo_t.si_code ffff88023ce19e58
si_code = 196610
crash> pd (3 << 16|2)
$2 = 196610
- It means "SIGBUS" (7) signal was sent to "init" task because of non-existent physical address.
#define __SI_FAULT (3 << 16)
...
/*
* SIGBUS si_codes
*/
#define BUS_ADRALN (__SI_FAULT|1) /* invalid address alignment */
#define BUS_ADRERR (__SI_FAULT|2) /* non-existant physical address */ <<<< (the comment has a misspelling)
#define BUS_OBJERR (__SI_FAULT|3) /* object specific hardware error */
#define NSIGBUS 3
- "SIGBUS" (7) signal fill in "si_addr" with the address of the fault.
crash> siginfo_t ffff88023ce19e58 | tail
_sigfault = {
_addr = 0x7f74c5e92238, <<<<
_addr_lsb = 0
},
_sigpoll = {
_band = 140139513324088,
_fd = -2130640896
}
}
}
- Verify the faulting memory reference.
crash> vtop 0x7f74c5e92238
VIRTUAL PHYSICAL
7f74c5e92238 (not accessible)
-
The physical address is not accessible.
-
Verify the kernel ring buffer.
-
The kernel ring buffer is full of "Read" and "write" error on swap-device (8:80).
crash> log | grep "Read-error" |wc -l
61
crash> log | grep "Read-error"|tail -n 5
Read-error on swap-device (8:80:88)
Read-error on swap-device (8:80:96)
Read-error on swap-device (8:80:104)
Read-error on swap-device (8:80:112)
Read-error on swap-device (8:80:120)
crash> log | grep "Write-error" |wc -l
11840
crash> log| grep "Write" | tail -n 5
Write-error on swap-device (8:80:755248)
Write-error on swap-device (8:80:755256)
Write-error on swap-device (8:80:755264)
Write-error on swap-device (8:80:755272)
Write-error on swap-device (8:80:755280)
^ ^
| '.....[ Minor Number = 80 }
'........[ Major Number = 8 }
-
Verify the swap devices on the system.
-
There are two swap devices on this system;
crash> swap
SWAP_INFO_STRUCT TYPE SIZE USED PCT PRI FILENAME
ffff88023a88d7c0 PARTITION 5242876k 393616k 7% 0 /dev/sda2 <<<--{ 07% usage }
ffff880236991540 PARTITION 3144700k 393680k 12% 0 /dev/sdf1 <<<--{ 12% usage }
- Determine the problematic swap device;
crash> swap_info_struct -ox | grep bdev
[0x18] struct block_device *bdev;
crash> swap_info_struct.bdev ffff880236991540
bdev = 0xffff88023a6d17c0
crash> block_device -ox | grep gendisk
[0x90] struct gendisk *bd_disk;
crash> block_device.bd_disk 0xffff88023a6d17c0
bd_disk = 0xffff8802379e2400
crash> gendisk 0xffff8802379e2400
struct gendisk {
major = 8, <<<---{ Major number of /dev/sdf1 is 8 }
first_minor = 80, <<<---{ Minor number of /dev/sdf1 is 80 }
minors = 16,
disk_name = "sdf\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000f",
-
The problematic swap device is /dev/sdf1.
-
The problematic swap device /dev/sdf1 is not listed in the output of "dev -d" output.
crash> dev -d
MAJOR GENDISK NAME REQUEST_QUEUE TOTAL ASYNC SYNC DRV
11 ffff88023783b000 sr0 ffff880237984e68 0 0 0 0
8 ffff880237aaa000 sda ffff880237984338 0 0 0 0
253 ffff880237a8ec00 dm-0 ffff88023699ef68 0 0 0 0
253 ffff880237aeb000 dm-1 ffff88023699e438 0 0 0 0
253 ffff88023a809c00 dm-2 ffff88023697e3f8 0 0 0 0
253 ffff880237b6b800 dm-3 ffff8802373d4fa8 0 0 0 0
253 ffff880237850000 dm-4 ffff8802372b6fe8 0 0 0 0
253 ffff8802369dc000 dm-5 ffff88023717d028 0 0 0 0
253 ffff880237b98800 dm-6 ffff8802372b64b8 0 0 0 0
253 ffff8802379d6800 dm-7 ffff88023717c4f8 0 0 0 0
253 ffff8802369dc400 dm-8 ffff8802373d4478 0 0 0 0
253 ffff880237b80800 dm-9 ffff88023b6ad068 0 0 0 0
253 ffff88023786d400 dm-10 ffff880236ad70a8 0 0 0 0
253 ffff8802373f7800 dm-11 ffff88023b6ac538 0 0 0 0
253 ffff8802370d0c00 dm-12 ffff8802373330e8 0 0 0 0
253 ffff880237b98400 dm-13 ffff8802372c9128 0 0 0 0
253 ffff880237e4a000 dm-14 ffff8802373325b8 0 0 0 0
253 ffff880237b9dc00 dm-15 ffff880236ad6578 0 0 0 0
8 ffff88011f290800 sdg ffff8802372c85f8 0 0 0 0
- Determine why ?
crash> gendisk.private_data 0xffff8802379e2400
private_data = 0xffff8802379f0800
crash> scsi_disk -ox | grep device
[0x8] struct scsi_device *device;
[0x10] struct device dev;
crash> scsi_disk.device,dev 0xffff8802379f0800
device = 0xffff880237b7c800
dev = {
parent = 0xffff880237b7c938,
p = 0xffff88023699c9c0,
kobj = {
name = 0xffff880236a8c480 "2:0:5:0",
crash> scsi_device -ox | grep sdev_state
[0x5a0] enum scsi_device_state sdev_state;
crash> scsi_device.sdev_state 0xffff880237b7c800
sdev_state = SDEV_DEL <<<---{ /dev/sdf1 is deleted from the system }
....
enum scsi_device_state {
SDEV_CREATED = 1, /* device created but not added to sysfs
* Only internal commands allowed (for inq) */
SDEV_RUNNING, /* device properly configured
* All commands allowed */
SDEV_CANCEL, /* beginning to delete device
* Only error handler commands allowed */
SDEV_DEL, /* device deleted
* no commands allowed */
SDEV_QUIESCE, /* Device quiescent. No block commands
* will be accepted, only specials (which
* originate in the mid-layer) */
SDEV_OFFLINE, /* Device offlined (by error handling or
* user request */
SDEV_BLOCK, /* Device blocked by scsi lld. No
* scsi commands from user or midlayer
* should be issued to the scsi
* lld. */
SDEV_CREATED_BLOCK, /* same as above but for created devices */
};
....
crash> shost -d | tail -n 5
Ndx Device scsi_device vendor model rev. iorq-cnt done-cnt err-cnt timeout state
--- -------------------- ------------------ ------------ ---------------- -------- -------- -------- ------ ------- -------- ------------
0 2:0:0:0 sda 0xFFFF880237223000 VMware Virtual disk 1.0 7422716 7422716 ( 0) 158 -- RUNNING
1 2:0:5:0 Disk 0xFFFF880237B7C800 VMware Virtual disk 1.0 4674 4674 ( 0) 102 -- DELETED
2 2:0:6:0 sdg 0xFFFF880108762800 VMware Virtual disk 1.0 3629313 3629313 ( 0) 14 -- RUNNING
- /dev/sdf1 is deleted from the system while it was in use.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments