Kernel panic - not syncing: Attempted to kill init!

Solution Verified - Updated -

Red Hat Insights can detect this issue

Proactively detect and remediate issues impacting your systems.
View matching systems and remediation

Environment

  • Red Hat Enterprise Linux 6
  • SIGBUS (7) signal

Issue

  • Server Rebooted Automatically with kernel panic and following call traces;
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.32-504.12.2.el6.x86_64 #1
Call Trace:
 [<ffffffff8152933c>] ? panic+0xa7/0x16f
 [<ffffffff8132f4f6>] ? get_current_tty+0x66/0x70
 [<ffffffff8107a5f2>] ? do_exit+0x862/0x870
 [<ffffffff8108c51d>] ? __sigqueue_free+0x3d/0x50
 [<ffffffff8107a658>] ? do_group_exit+0x58/0xd0
 [<ffffffff81090306>] ? get_signal_to_deliver+0x1f6/0x460
 [<ffffffff8100a265>] ? do_signal+0x75/0x800
 [<ffffffff8109eefc>] ? remove_wait_queue+0x3c/0x50
 [<ffffffff81010000>] ? show_registers+0x220/0x280
 [<ffffffff8104c527>] ? is_prefetch+0xb7/0x230
 [<ffffffff810796c8>] ? sys_waitid+0xa8/0x1f0
 [<ffffffff8100aa80>] ? do_notify_resume+0x90/0xc0
 [<ffffffff8100badc>] ? retint_signal+0x48/0x8c
  • Missing file ,"/lib64/ld-linux-x86-64.so.2"

Resolution

  • Verify the reason behind deletion of swap device while it was in use.

In some cases it is also observed that underlying devices used for swap filesystem are not deleted from system, but are having large number of IO errors, due to which the read IO on swap filesystem fails and results in the same panic as described here. In such case it would be recommended to investigate why the IOs on underlying disk devices were failing.

  • Verify the glibc package.
# rpm -V glibc 
missing   /lib64/ld-linux-x86-64.so.2
  • Boot the system to rescue mode and Install the glibc package.
# rpm -ivh --root=/mnt/sysimage /path/to/glic-<version-release.arch> --replacepkgs --replacefiles

NOTE:- check the glibc|glibc-common version installed, download and install the package.

Root Cause

  • The kernel invoked panic() function because "init" task with PID (1) received a "SIGBUS" (7) signal due to "BUS_ADRERR".
  • A "SIGBUS" can be caused by any general device fault that the computer detects, though a bus error rarely means that the computer hardware is physically broken. Bus error may also be raised for certain other paging errors.
  • In this example "init" task with PID (1) received a "SIGBUS" (7) signal because of the deletion of swap device while it was in use.

Diagnostic Steps

System Information:

crash> sys | grep -e UPTIME -e RELEASE -e MACHINE -e PANIC
      UPTIME: 17 days, 19:45:20
     RELEASE: 2.6.32-504.12.2.el6.x86_64
     MACHINE: x86_64  (2400 Mhz)
       PANIC: "Kernel panic - not syncing: Attempted to kill init!"

crash> rd -a ffffffff8202b000 160
ffffffff8202b000:  Phoenix Technologies LTD
ffffffff8202b01c:  6.00
ffffffff8202b024:  2.4
ffffffff8202b02c:  10/22/2013
ffffffff8202b038:  VMware, Inc.
ffffffff8202b048:  VMware Virtual Platform
ffffffff8202b060:  None
ffffffff8202b068:  VMware-42 2e 37 a2 37 de c5 c4-b7 d2 17 c0 66 c5 5c 04

Kernel Ring Buffer:

crash> log
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: init Not tainted 2.6.32-504.12.2.el6.x86_64 #1
Call Trace:
 [<ffffffff8152933c>] ? panic+0xa7/0x16f
 [<ffffffff8132f4f6>] ? get_current_tty+0x66/0x70
 [<ffffffff8107a5f2>] ? do_exit+0x862/0x870
 [<ffffffff8108c51d>] ? __sigqueue_free+0x3d/0x50
 [<ffffffff8107a658>] ? do_group_exit+0x58/0xd0
 [<ffffffff81090306>] ? get_signal_to_deliver+0x1f6/0x460
 [<ffffffff8100a265>] ? do_signal+0x75/0x800
 [<ffffffff8109eefc>] ? remove_wait_queue+0x3c/0x50
 [<ffffffff81010000>] ? show_registers+0x220/0x280
 [<ffffffff8104c527>] ? is_prefetch+0xb7/0x230
 [<ffffffff810796c8>] ? sys_waitid+0xa8/0x1f0
 [<ffffffff8100aa80>] ? do_notify_resume+0x90/0xc0
 [<ffffffff8100badc>] ? retint_signal+0x48/0x8c

Backtrace of panic task:

crash> set -p
    PID: 1
COMMAND: "init"
   TASK: ffff88023ce17500  [THREAD_INFO: ffff88023ce18000]
    CPU: 2
  STATE:  (PANIC)

crash> bt
PID: 1      TASK: ffff88023ce17500  CPU: 2   COMMAND: "init"
 #0 [ffff88023ce19b30] machine_kexec at ffffffff8103b5bb
 #1 [ffff88023ce19b90] crash_kexec at ffffffff810c9852
 #2 [ffff88023ce19c60] panic at ffffffff81529343
 #3 [ffff88023ce19ce0] do_exit at ffffffff8107a5f2
 #4 [ffff88023ce19d60] do_group_exit at ffffffff8107a658
 #5 [ffff88023ce19d90] get_signal_to_deliver at ffffffff81090306
 #6 [ffff88023ce19e30] do_signal at ffffffff8100a265
 #7 [ffff88023ce19f30] do_notify_resume at ffffffff8100aa80
 #8 [ffff88023ce19f50] retint_signal at ffffffff8100badc
    RIP: 00007f74c34f0dbb  RSP: 00007fff4be7dce0  RFLAGS: 00010206
    RAX: 00007f74c5e92230  RBX: 00007f74c5e92230  RCX: 0000000000000042
    RDX: 0000000000000000  RSI: 00007f74c5e7ffa0  RDI: 00007f74c5e8f2b8
    RBP: 0000000000000003   R8: 000000000000ffff   R9: 0000000000000004
    R10: 00007f74c34d4110  R11: 00007f74c34d8065  R12: 00007f74c5e7ffa0
    R13: 00007f74c5e8f2b8  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
  • Determine the value of "signal" and "exit_signal" from "task_struct" structure of panic task.
crash> task_struct.signal,exit_signal ffff88023ce17500
  signal = 0xffff88023ce1da00
  exit_signal = 0
  • Determine the value of "flags" from "signal_struct" structure.
crash> signal_struct.flags 0xffff88023ce1da00
  flags = 8
  • The value "8" of "flags" field means {SIGNAL_GROUP_EXIT} ( i.e group exit in progress )
crash> eval -b 8
hexadecimal: 8  
    decimal: 8  
      octal: 10
     binary: 0000000000000000000000000000000000000000000000000000000000001000
   bits set: 3 

Kernel Source: include/linux/sched.h

/*
 * Bits in flags field of signal_struct.
 */
#define SIGNAL_STOP_STOPPED     0x00000001 /* job control stop in effect */
#define SIGNAL_STOP_DEQUEUED    0x00000002 /* stop signal dequeued */
#define SIGNAL_STOP_CONTINUED   0x00000004 /* SIGCONT since WCONTINUED reap */
#define SIGNAL_GROUP_EXIT       0x00000008 /* group exit in progress */
#define SIGNAL_GROUP_COREDUMP   0x00000080 /* coredump in progress */
  • Determine the value of "si_signo" and "si_code" from "siginfo_t" structure.
    • "si_signo" is a signal number being delivered.
    • "si_code" is a reason code, a numerical value (not a bit mask) indicating why this signal was sent.
crash> struct siginfo_t -o
typedef struct siginfo {
    [0] int si_signo;
    [4] int si_errno;
    [8] int si_code;
        union {
            int _pad[28];
            struct {...} _kill;
            struct {...} _timer;
            struct {...} _rt;
            struct {...} _sigchld;
            struct {...} _sigfault;
            struct {...} _sigpoll;
   [16] } _sifields;
} siginfo_t;
SIZE: 128
  • "siginfo_t *" is the first argument of get_signal_to_deliver() function.
crash> px get_signal_to_deliver
get_signal_to_deliver = $1 = 
 {int (siginfo_t *, struct k_sigaction *, struct pt_regs *, void *)} 0xffffffff81090110 <get_signal_to_deliver>

crash> dis -r ffffffff8100a265| tail -n 5
0xffffffff8100a257 <do_signal+103>: mov    %rbx,%rdx
0xffffffff8100a25a <do_signal+106>: mov    %r13,%rsi
0xffffffff8100a25d <do_signal+109>: mov    %r14,%rdi
0xffffffff8100a260 <do_signal+112>: callq  0xffffffff81090110 <get_signal_to_deliver>
0xffffffff8100a265 <do_signal+117>: test   %eax,%eax

crash> dis -r ffffffff81090306| head -n 5
0xffffffff81090110 <get_signal_to_deliver>: push   %rbp
0xffffffff81090111 <get_signal_to_deliver+1>:   mov    %rsp,%rbp
0xffffffff81090114 <get_signal_to_deliver+4>:   push   %r15
0xffffffff81090116 <get_signal_to_deliver+6>:   push   %r14
0xffffffff81090118 <get_signal_to_deliver+8>:   push   %r13

crash> bt -f | grep -e get_signal_to_deliver -A 11
 #5 [ffff88023ce19d90] get_signal_to_deliver at ffffffff81090306
    ffff88023ce19d98: ffff88023ce17500 0000000000000004 
    ffff88023ce19da8: 0000000000000002 ffff88023ce17b78 
    ffff88023ce19db8: ffff88023ce19ed8 ffff88023ce19f58 
    ffff88023ce19dc8: ffff88023ce17500 ffff88023ce17500 
    ffff88023ce19dd8: ffff88023ce17500 ffff880237803948 
    ffff88023ce19de8: ffff880237803140 ffff88023ce17c78 
    ffff88023ce19df8: ffff88023ce19f18 ffff88023ce19f58 
    ffff88023ce19e08: ffff88023ce19f58 ffff88023ce19ed8 
    ffff88023ce19e18: ffff88023ce19e58 ffff88023ce17c78 
    ffff88023ce19e28: ffff88023ce19f28 ffffffff8100a265 
 #6 [ffff88023ce19e30] do_signal at ffffffff8100a265
  • The value of "si_signo" is set to (7).
crash> siginfo_t.si_signo ffff88023ce19e58
  si_signo = 7    /* Signal number */
  • It means "init" task with PID (1) received a "SIGBUS" (7) signal.
#define SIGBUS           7
  • The value of "si_code" is set to (196610).
crash> siginfo_t.si_code ffff88023ce19e58
  si_code = 196610

crash> pd (3 << 16|2)  
$2 = 196610
  • It means "SIGBUS" (7) signal was sent to "init" task because of non-existent physical address.
#define __SI_FAULT      (3 << 16)
...
/*
 * SIGBUS si_codes
 */
#define BUS_ADRALN      (__SI_FAULT|1)  /* invalid address alignment */
#define BUS_ADRERR      (__SI_FAULT|2)  /* non-existant physical address */    <<<< (the comment has a misspelling)
#define BUS_OBJERR      (__SI_FAULT|3)  /* object specific hardware error */
#define NSIGBUS         3
  • "SIGBUS" (7) signal fill in "si_addr" with the address of the fault.
crash> siginfo_t ffff88023ce19e58 | tail
    _sigfault = {
      _addr = 0x7f74c5e92238,  <<<<
      _addr_lsb = 0
    }, 
    _sigpoll = {
      _band = 140139513324088, 
      _fd = -2130640896
    }
  }
}
  • Verify the faulting memory reference.
crash> vtop 0x7f74c5e92238
VIRTUAL     PHYSICAL        
7f74c5e92238  (not accessible)
  • The physical address is not accessible.

  • Verify the kernel ring buffer.

  • The kernel ring buffer is full of "Read" and "write" error on swap-device (8:80).

crash> log | grep "Read-error" |wc -l
61

crash> log | grep "Read-error"|tail -n 5
Read-error on swap-device (8:80:88)
Read-error on swap-device (8:80:96)
Read-error on swap-device (8:80:104)
Read-error on swap-device (8:80:112)
Read-error on swap-device (8:80:120)

crash> log | grep "Write-error" |wc -l
11840

crash> log| grep "Write" | tail -n 5
Write-error on swap-device (8:80:755248)
Write-error on swap-device (8:80:755256)
Write-error on swap-device (8:80:755264)
Write-error on swap-device (8:80:755272)
Write-error on swap-device (8:80:755280)
                            ^  ^
                            |  '.....[ Minor Number = 80 }
                            '........[ Major Number = 8  }
  • Verify the swap devices on the system.

  • There are two swap devices on this system;

crash> swap
SWAP_INFO_STRUCT    TYPE       SIZE       USED     PCT  PRI  FILENAME
ffff88023a88d7c0  PARTITION  5242876k    393616k    7%    0  /dev/sda2  <<<--{ 07% usage }
ffff880236991540  PARTITION  3144700k    393680k   12%    0  /dev/sdf1  <<<--{ 12% usage }
  • Determine the problematic swap device;
crash> swap_info_struct -ox | grep bdev
  [0x18] struct block_device *bdev;

crash> swap_info_struct.bdev ffff880236991540
  bdev = 0xffff88023a6d17c0

crash> block_device -ox | grep gendisk
  [0x90] struct gendisk *bd_disk;

crash> block_device.bd_disk 0xffff88023a6d17c0
  bd_disk = 0xffff8802379e2400

crash> gendisk 0xffff8802379e2400
struct gendisk {
  major = 8,                <<<---{ Major number of /dev/sdf1 is 8  }
  first_minor = 80,         <<<---{ Minor number of /dev/sdf1 is 80 }
  minors = 16, 
  disk_name = "sdf\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000f", 
  • The problematic swap device is /dev/sdf1.

  • The problematic swap device /dev/sdf1 is not listed in the output of "dev -d" output.

crash> dev -d
MAJOR GENDISK            NAME       REQUEST_QUEUE      TOTAL ASYNC  SYNC   DRV
   11 ffff88023783b000   sr0        ffff880237984e68       0     0     0     0
    8 ffff880237aaa000   sda        ffff880237984338       0     0     0     0
  253 ffff880237a8ec00   dm-0       ffff88023699ef68       0     0     0     0
  253 ffff880237aeb000   dm-1       ffff88023699e438       0     0     0     0
  253 ffff88023a809c00   dm-2       ffff88023697e3f8       0     0     0     0
  253 ffff880237b6b800   dm-3       ffff8802373d4fa8       0     0     0     0
  253 ffff880237850000   dm-4       ffff8802372b6fe8       0     0     0     0
  253 ffff8802369dc000   dm-5       ffff88023717d028       0     0     0     0
  253 ffff880237b98800   dm-6       ffff8802372b64b8       0     0     0     0
  253 ffff8802379d6800   dm-7       ffff88023717c4f8       0     0     0     0
  253 ffff8802369dc400   dm-8       ffff8802373d4478       0     0     0     0
  253 ffff880237b80800   dm-9       ffff88023b6ad068       0     0     0     0
  253 ffff88023786d400   dm-10      ffff880236ad70a8       0     0     0     0
  253 ffff8802373f7800   dm-11      ffff88023b6ac538       0     0     0     0
  253 ffff8802370d0c00   dm-12      ffff8802373330e8       0     0     0     0
  253 ffff880237b98400   dm-13      ffff8802372c9128       0     0     0     0
  253 ffff880237e4a000   dm-14      ffff8802373325b8       0     0     0     0
  253 ffff880237b9dc00   dm-15      ffff880236ad6578       0     0     0     0
    8 ffff88011f290800   sdg        ffff8802372c85f8       0     0     0     0
  • Determine why ?
crash> gendisk.private_data 0xffff8802379e2400
  private_data = 0xffff8802379f0800

crash> scsi_disk -ox | grep device
    [0x8] struct scsi_device *device;
   [0x10] struct device dev;

crash> scsi_disk.device,dev 0xffff8802379f0800
  device = 0xffff880237b7c800
  dev = {
    parent = 0xffff880237b7c938, 
    p = 0xffff88023699c9c0, 
    kobj = {
      name = 0xffff880236a8c480 "2:0:5:0", 

crash> scsi_device -ox | grep sdev_state
  [0x5a0] enum scsi_device_state sdev_state;

crash> scsi_device.sdev_state 0xffff880237b7c800
  sdev_state = SDEV_DEL      <<<---{ /dev/sdf1 is deleted from the system }
....
enum scsi_device_state {
        SDEV_CREATED = 1,       /* device created but not added to sysfs
                                 * Only internal commands allowed (for inq) */
        SDEV_RUNNING,           /* device properly configured
                                 * All commands allowed */
        SDEV_CANCEL,            /* beginning to delete device
                                 * Only error handler commands allowed */
        SDEV_DEL,               /* device deleted
                                 * no commands allowed */
        SDEV_QUIESCE,           /* Device quiescent.  No block commands
                                 * will be accepted, only specials (which
                                 * originate in the mid-layer) */
        SDEV_OFFLINE,           /* Device offlined (by error handling or
                                 * user request */
        SDEV_BLOCK,             /* Device blocked by scsi lld.  No
                                 * scsi commands from user or midlayer
                                 * should be issued to the scsi
                                 * lld. */
        SDEV_CREATED_BLOCK,     /* same as above but for created devices */
};
....

crash> shost -d | tail -n 5
Ndx   Device               scsi_device        vendor       model            rev.     iorq-cnt done-cnt        err-cnt timeout  state
--- -------------------- ------------------ ------------ ---------------- -------- -------- -------- ------ ------- -------- ------------
  0 2:0:0:0    sda       0xFFFF880237223000 VMware       Virtual disk     1.0        7422716   7422716 (  0)      158       -- RUNNING
  1 2:0:5:0    Disk      0xFFFF880237B7C800 VMware       Virtual disk     1.0           4674      4674 (  0)      102       -- DELETED
  2 2:0:6:0    sdg       0xFFFF880108762800 VMware       Virtual disk     1.0        3629313   3629313 (  0)       14       -- RUNNING
  • /dev/sdf1 is deleted from the system while it was in use.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments