Kernel panic at elv_drain_elevator+0x27/0x80

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 6.3
  • SCSI

Issue

  • Getting a kernel panic with the following call trace:
Stack:
 ffff8802f5861d10 ffff8804778b2c60 ffff8802f5861cf0 ffffffff812559da
<d> ffff8804778b2c60 ffff8804778b3518 ffff8804778b2f78 0000000000000000
<d> ffff8802f5861d20 ffffffff81257786 0000000000000000 ffff88047400b800
Call Trace:
 [<ffffffff812559da>] blk_drain_queue+0x7a/0x100
 [<ffffffff81257786>] blk_cleanup_queue+0xc6/0x150
 [<ffffffff8137136d>] __scsi_remove_device+0x7d/0xc0
 [<ffffffff813713e0>] scsi_remove_device+0x30/0x50
 [<ffffffff813714cd>] __scsi_remove_target+0xad/0xe0
 [<ffffffff81371570>] ? __remove_child+0x0/0x30
 [<ffffffff81371593>] __remove_child+0x23/0x30
 [<ffffffff8134c60c>] device_for_each_child+0x4c/0x80
 [<ffffffffa0068010>] ? fc_starget_delete+0x0/0x30 [scsi_transport_fc]
 [<ffffffff8137153e>] scsi_remove_target+0x3e/0x70
 [<ffffffffa0068036>] fc_starget_delete+0x26/0x30 [scsi_transport_fc]
 [<ffffffff8108c760>] worker_thread+0x170/0x2a0
 [<ffffffff810920d0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108c5f0>] ? worker_thread+0x0/0x2a0
 [<ffffffff81091d66>] kthread+0x96/0xa0
 [<ffffffff8100c14a>] child_rip+0xa/0x20
 [<ffffffff81091cd0>] ? kthread+0x0/0xa0
 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Code: 2a ff ff ff 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 89 fb 0f 1f 80 00 00 00 00 48 8b 43 18 be 01 00 00 00 48 89 df 48 8b 00 <ff> 50 28 85 c0 75 ea 8b 93 b0 03 00 00 85 d2 74 14 8b 05 ae ec 
RIP  [<ffffffff8124eb67>] elv_drain_elevator+0x27/0x80
 RSP <ffff8802f5861cb0>
CR2: 0000000000000028

Resolution

  • Issue is fixed in 6.4 running kernel 2.6.32-358.18.1.el6 as indicated in RHSA-2013-1173. Upgrade to RHEL 6.4 with that kernel version or later, or to RHEL 6.5 or later.

Root Cause

  • A crash has happened when the underlying SCSI device was being removed. As part of the device removal, the st driver is detached from the SCSI device, reducing the reference count. Since the reference count is wrong, the request queue is freed prematurely. The system crashes from trying to use a pointer which was set to NULL as part of the freeing of the request queue.

  • This issue is tracked in bug 927988

Diagnostic Steps

crash> bt -fl
...
    [exception RIP: elv_drain_elevator+39]
    RIP: ffffffff8124eb67  RSP: ffff8802f5861cb0  RFLAGS: 00010082
    RAX: 0000000000000000  RBX: ffff8804778b2c60  RCX: 0000000000000003
    RDX: 000000000000aedd  RSI: 0000000000000001  RDI: ffff8804778b2c60
    RBP: ffff8802f5861cc0   R8: 000000000000000c   R9: 0000000000000173
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000001
    R13: ffff8804778b3590  R14: ffff8804778b3580  R15: ffffe8fd22600648
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    ffff8802f5861c08: ffffe8fd22600648 ffff8804778b3580 
    ffff8802f5861c18: ffff8804778b3590 0000000000000001 
    ffff8802f5861c28: ffff8802f5861cc0 ffff8804778b2c60 
    ffff8802f5861c38: 0000000000000000 0000000000000000 
    ffff8802f5861c48: 0000000000000173 000000000000000c 
    ffff8802f5861c58: 0000000000000000 0000000000000003 
    ffff8802f5861c68: 000000000000aedd 0000000000000001 
    ffff8802f5861c78: ffff8804778b2c60 ffffffffffffffff 
    ffff8802f5861c88: ffffffff8124eb67 0000000000000010 
    ffff8802f5861c98: 0000000000010082 ffff8802f5861cb0 
    ffff8802f5861ca8: 0000000000000018 ffff8802f5861d10 
    ffff8802f5861cb8: ffff8804778b2c60 ffff8802f5861cf0 
    ffff8802f5861cc8: ffffffff812559da 
 #9 [ffff8802f5861cc8] blk_drain_queue at ffffffff812559da
    /usr/src/debug/kernel-2.6.32-279.5.2.el6/linux-2.6.32-279.5.2.el6.x86_64/block/blk-core.c: 275
    ffff8802f5861cd0: ffff8804778b2c60 ffff8804778b3518 
    ffff8802f5861ce0: ffff8804778b2f78 0000000000000000 
    ffff8802f5861cf0: ffff8802f5861d20 ffffffff81257786 

The path we follow before the panic is blk_drain_queue()->elv_drain_elevator(). I want to check what does blk_drain_queue() looks like:

     266 void blk_drain_queue(struct request_queue *q, bool drain_all)
     267 {
     268         while (true) {
     269                 bool drain = false;
     270                 int i;
     271 
     272                 spin_lock_irq(q->queue_lock);
     273 
     274                 elv_drain_elevator(q);
     275                 if (drain_all)
     276                         blk_throtl_drain(q);
     277 

It seems that we pass a request_queue q to elv_drain_elevator(). Lets search now for the address of this request_queue we are passing to elv_drain_elevator():

        [exception RIP: elv_drain_elevator+39]
        RIP: ffffffff8124eb67  RSP: ffff8802f5861cb0  RFLAGS: 00010082
        RAX: 0000000000000000  RBX: ffff8804778b2c60  RCX: 0000000000000003
        RDX: 000000000000aedd  RSI: 0000000000000001  RDI: ffff8804778b2c60
        RBP: ffff8802f5861cc0   R8: 000000000000000c   R9: 0000000000000173
        R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000001
        R13: ffff8804778b3590  R14: ffff8804778b3580  R15: ffffe8fd22600648
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
        ffff8802f5861c08: ffffe8fd22600648 ffff8804778b3580 
        ffff8802f5861c18: ffff8804778b3590 0000000000000001 
        ffff8802f5861c28: ffff8802f5861cc0 ffff8804778b2c60 
        ffff8802f5861c38: 0000000000000000 0000000000000000 
        ffff8802f5861c48: 0000000000000173 000000000000000c 
        ffff8802f5861c58: 0000000000000000 0000000000000003 
        ffff8802f5861c68: 000000000000aedd 0000000000000001 
        ffff8802f5861c78: ffff8804778b2c60 ffffffffffffffff 
        ffff8802f5861c88: ffffffff8124eb67 0000000000000010 
        ffff8802f5861c98: 0000000000010082 ffff8802f5861cb0 
        ffff8802f5861ca8: 0000000000000018 ffff8802f5861d10 
        ffff8802f5861cb8: ffff8804778b2c60 ffff8802f5861cf0 
        ffff8802f5861cc8: ffffffff812559da 
     #9 [ffff8802f5861cc8] blk_drain_queue at ffffffff812559da
        ffff8802f5861cd0: ffff8804778b2c60 ffff8804778b3518 
        ffff8802f5861ce0: ffff8804778b2f78 0000000000000000 
        ffff8802f5861cf0: ffff8802f5861d20 ffffffff81257786 

Lets take a quick look to the elv_drain_elevator() function as well:

     564 void elv_drain_elevator(struct request_queue *q)
     565 {
     566         static int printed;
     567 
     568         lockdep_assert_held(q->queue_lock);
     569 
     570         while (q->elevator->ops->elevator_dispatch_fn(q, 1))
     571                 ;
     572         if (q->nr_sorted && printed++ < 10) {
     573                 printk(KERN_ERR "%s: forced dispatching is broken "
     574                        "(nr_sorted=%u), please report this\n",
     575                        q->elevator->elevator_type->elevator_name, q->nr_sorted);
     576         }
     577 }

The first and only argument passed to elv_drain_elevator() is the request_queue q as can be seen below, and the first argument if I am not mistaken is rdi, and above we can see that rdi is ffff8804778b2c60. Lets check if this value changes or we can trust it. This is the dissassembled code for elv_drain_elevator:

    crash> dis -rl ffffffff8124eb67
    /usr/src/debug/kernel-2.6.32-279.5.2.el6/linux-2.6.32-279.5.2.el6.x86_64/block/elevator.c: 565
    0xffffffff8124eb40 <elv_drain_elevator>:        push   %rbp
    0xffffffff8124eb41 <elv_drain_elevator+1>:      mov    %rsp,%rbp
    0xffffffff8124eb44 <elv_drain_elevator+4>:      push   %rbx
    0xffffffff8124eb45 <elv_drain_elevator+5>:      sub    $0x8,%rsp
    0xffffffff8124eb49 <elv_drain_elevator+9>:      nopl   0x0(%rax,%rax,1)
    0xffffffff8124eb4e <elv_drain_elevator+14>:     mov    %rdi,%rbx
    0xffffffff8124eb51 <elv_drain_elevator+17>:     nopl   0x0(%rax)
    /usr/src/debug/kernel-2.6.32-279.5.2.el6/linux-2.6.32-279.5.2.el6.x86_64/block/elevator.c: 570
    0xffffffff8124eb58 <elv_drain_elevator+24>:     mov    0x18(%rbx),%rax
    0xffffffff8124eb5c <elv_drain_elevator+28>:     mov    $0x1,%esi
    0xffffffff8124eb61 <elv_drain_elevator+33>:     mov    %rbx,%rdi
    0xffffffff8124eb64 <elv_drain_elevator+36>:     mov    (%rax),%rax
    0xffffffff8124eb67 <elv_drain_elevator+39>:     callq  *0x28(%rax)

Going back to the elv_drain_elevator() function above, lets see the value we are getting in the while() at line 570. First, the address of the elevator:

    crash> px ((struct request_queue *)0xffff8804778b2c60)->elevator
    $5 = (struct elevator_queue *) 0xffff880479e90440

It seems we are in the while statement in 570 when we hit the null pointer. The while checks for "q->elevator->ops->elevator_dispatch_fn(q, 1)" which is a function that is implemented by all schedulers:

    0 block/as-iosched.c       1467 .elevator_dispatch_fn = as_dispatch_request,
    1 block/cfq-iosched.c      4142 .elevator_dispatch_fn = cfq_dispatch_requests,
    2 block/deadline-iosched.c  444 .elevator_dispatch_fn = deadline_dispatch_requests,
    3 block/noop-iosched.c       90 .elevator_dispatch_fn = noop_dispatch,
    4 include/linux/elevator.h   20 typedef int (elevator_dispatch_fn) (struct request_queue *, int );

The 1 we pass in our call means that we are forcing the dispatch, and the q is the current queue we are dispatching. We know that the request_queue pointer is 0xffff8804778b2c60, and the elevator_queue is in 0xffff880479e90440 (see above how I found them). So the next element I am looking for is q->elevator->ops which is:

    crash> px ((struct elevator_queue *) 0xffff880479e90440)->ops
    $7 = (struct elevator_ops *) 0x0

NULL?

Either I have the wrong address for the elevator_queue, or there is something strange going on with this elevator_queue. There are other values marked as NULL as well:

    crash> struct elevator_queue 0xffff880479e90440
    struct elevator_queue {
      ops = 0x0, 
      elevator_data = 0xffff88047767b000, 
      kobj = {
        name = 0x0, 
        entry = {
          next = 0xffff880479e90458, 
          prev = 0xffff880479e90458
        }, 
        parent = 0x0, 
        kset = 0x0, 
        ktype = 0xffffffff81aeafb0, 
        sd = 0x0, 
        kref = {
          refcount = {
        counter = 0
          }
        }, 
        state_initialized = 1, 
        state_in_sysfs = 0, 
        state_add_uevent_sent = 0, 
        state_remove_uevent_sent = 0, 
        uevent_suppress = 0
      }, 
      elevator_type = 0xffffffff81aed420, 
      sysfs_lock = {
        count = {
          counter = 1
        }, 
        wait_lock = {
          raw_lock = {
        slock = 0
          }
        }, 
        wait_list = {
          next = 0xffff880479e904a0, 
          prev = 0xffff880479e904a0
        }, 
        owner = 0x0
      }, 
      hash = 0xffff8803519d9e00, 
      registered = 0
    }

It seems that this elevator_queue may belong to a device that is not there any more. There are a number of deleted and blocked devices, but it looks like only tapes/changers. All the disks report "running" state. The changers are the only ones listing any outstanding requests against them (diff between io request count and io request done).

host     driver                              (device)
name     name                             Scsi_Host          .shost_classdev      .shost_data        &.hostdata[0]      .hostdata[0]
-------- -------------------------------- ------------------ -------------------  ------------------ ------------------ ------------------
host0    LSI SAS based MegaRAID driver    0xFFFF88047C3DC000 0xFFFF88047C3DC3A0 - 0x0000000000000000 0xFFFF88047C3DC5E0 0xFFFF880079C1F000
host1    qla2xxx                          0xFFFF8802F5D69800 0xFFFF8802F5D69BA0 - 0xFFFF8802F5CEF000 0xFFFF8802F5D69DE0 0xFFFF8802F6BDC5A8
host2    qla2xxx                          0xFFFF8802F5CED000 0xFFFF8802F5CED3A0 - 0xFFFF8802F5A42800 0xFFFF8802F5CED5E0 0xFFFF8802F72DBDA8
host3    qla2xxx                          0xFFFF8802F5A5D000 0xFFFF8802F5A5D3A0 - 0xFFFF8802F543E800 0xFFFF8802F5A5D5E0 0xFFFF8802F5A5DDA8
host4    qla2xxx                          0xFFFF8802F5441000 0xFFFF8802F54413A0 - 0xFFFF8802F5639800 0xFFFF8802F54415E0 0xFFFF8802F5441DA8
host5    ata_piix                         0xFFFF8802F7323000 0xFFFF8802F73233A0 - 0x0000000000000000 0xFFFF8802F73235E0 0xFFFF8802F6B04000
host6    ata_piix                         0xFFFF8802F6AC0800 0xFFFF8802F6AC0BA0 - 0x0000000000000000 0xFFFF8802F6AC0DE0 0xFFFF8802F6B08000
host7    ata_piix                         0xFFFF8802F6AC0000 0xFFFF8802F6AC03A0 - 0x0000000000000000 0xFFFF8802F6AC05E0 0xFFFF8802F7310000
host8    ata_piix                         0xFFFF8802F69E5800 0xFFFF8802F69E5BA0 - 0x0000000000000000 0xFFFF8802F69E5DE0 0xFFFF8802F566C000


====================================================================================================================================
host     driver
name     name                             Scsi_Host
-------- -------------------------------- ------------------
host1    qla2xxx                          0xFFFF8802F5D69800

Ndx   Device         scsi_device        vendor       model            rev.     iorq-cnt done-cnt        err-cnt timeout  state
----- -------------- ------------------ ------------ ---------------- -------- -------- -------- ------ ------- -------- ------------
[  0] 1:0:0:0        0xFFFF88047400B800 HP      Ult~ Ultrium 5-SCSI ~ I57H     89964042 89964042 (  0)       63       -- DELETED
[  1] 1:0:1:0        0xFFFF8802F1799800 HP      Ult~ Ultrium 5-SCSI ~ I57H     54800275 54800275 (  0)       43       -- BLOCKED

Lets take a look at the Deleted and BLOCKed tape devices:

crash> px *(struct scsi_device *)0xFFFF88047400B800
$8 = {
  host = 0xffff8802f5d69800, 
  request_queue =                  0xffff8804778b2c60,   << hello!?
crash> px ((struct request_queue *)0xffff8804778b2c60)->elevator  << check..

This is the request_queue we were looking at previously. Its for this deleted device.

Ndx   Device         scsi_device        vendor       model            rev.     iorq-cnt done-cnt        err-cnt timeout  state
----- -------------- ------------------ ------------ ---------------- -------- -------- -------- ------ ------- -------- ------------
[  0] 1:0:0:0        0xFFFF88047400B800 HP      Ult~ Ultrium 5-SCSI ~ I57H     89964042 89964042 (  0)       63       -- DELETED

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.