RHEL6.3.z: kernel panic in lpfc driver due to corrupt stack, RIP list_del, called from scsi_error_handler
Issue
- Kernel crashes with Emulex lpfc driver during SCSI abort / recovery scenario
- System panic with the following messages
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa0079f8d
lpfc 0000:02:00.3: 1:(0):0748 abort handler timed out waiting for abort to complete: ret 0x2003, ID 0, LUN 0
lpfc 0000:02:00.3: 1:(0):0748 abort handler timed out waiting for abort to complete: ret 0x2003, ID 0, LUN 0
lpfc 0000:02:00.3: 1:(0):0748 abort handler timed out waiting for abort to complete: ret 0x2003, ID 0, LUN 0
lpfc 0000:02:00.3: 1:(0):0748 abort handler timed out waiting for abort to complete: ret 0x2003, ID 0, LUN 1
lpfc 0000:02:00.3: 1:(0):0748 abort handler timed out waiting for abort to complete: ret 0x2003, ID 0, LUN 0
lpfc 0000:02:00.3: 1:(0):0748 abort handler timed out waiting for abort to complete: ret 0x2003, ID 0, LUN 1
lpfc 0000:02:00.3: 1:(0):0748 abort handler timed out waiting for abort to complete: ret 0x2003, ID 0, LUN 0
lpfc 0000:02:00.3: 1:(0):0748 abort handler timed out waiting for abort to complete: ret 0x2003, ID 0, LUN 1
lpfc 0000:02:00.3: 1:(0):0748 abort handler timed out waiting for abort to complete: ret 0x2003, ID 0, LUN 2
[various blocked task messages...]
lpfc 0000:02:00.3: 1:0338 IOCB wait timeout error - no wake response Data x3c
lpfc 0000:02:00.3: 1:(0):0727 TMF FCP_LUN_RESET to TGT 0 LUN 0 failed (0, 0) iocb_flag x204
lpfc 0000:02:00.3: 1:(0):0713 SCSI layer issued Device Reset (0, 0) return x2007
lpfc 0000:02:00.3: 1:(0):0724 I/O flush failure for context LUN : cnt x5
lpfc 0000:02:00.3: 1:0338 IOCB wait timeout error - no wake response Data x3c
lpfc 0000:02:00.3: 1:(0):0727 TMF FCP_LUN_RESET to TGT 0 LUN 1 failed (0, 0) iocb_flag x204
lpfc 0000:02:00.3: 1:(0):0713 SCSI layer issued Device Reset (0, 1) return x2007
lpfc 0000:02:00.3: 1:(0):0724 I/O flush failure for context LUN : cnt x3
lpfc 0000:02:00.3: 1:0338 IOCB wait timeout error - no wake response Data x3c
lpfc 0000:02:00.3: 1:(0):0727 TMF FCP_LUN_RESET to TGT 0 LUN 2 failed (0, 0) iocb_flag x204
lpfc 0000:02:00.3: 1:(0):0713 SCSI layer issued Device Reset (0, 2) return x2007
lpfc 0000:02:00.3: 1:(0):0724 I/O flush failure for context LUN : cnt x1
lpfc 0000:02:00.3: 1:0338 IOCB wait timeout error - no wake response Data x3c
lpfc 0000:02:00.3: 1:(0):0727 TMF FCP_TARGET_RESET to TGT 0 LUN 0 failed (0, 0) iocb_flag x204
lpfc 0000:02:00.3: 1:(0):0723 SCSI layer issued Target Reset (0, 0) return x2007
lpfc 0000:02:00.3: 1:(0):0724 I/O flush failure for context TGT : cnt x9
lpfc 0000:02:00.3: 1:0338 IOCB wait timeout error - no wake response Data x3c
lpfc 0000:02:00.3: 1:(0):0727 TMF FCP_TARGET_RESET to TGT 0 LUN 0 failed (0, 0) iocb_flag x204
lpfc 0000:02:00.3: 1:(0):0700 Bus Reset on target 0 failed
lpfc 0000:02:00.3: 1:(0):0724 I/O flush failure for context HOST : cnt x9
lpfc 0000:02:00.3: 1:(0):0714 SCSI layer issued Bus Reset Data: x2003
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/0000:06:00.3/host4/rport-4:0-2/target4:0:0/4:0:0:2/timeout
CPU 16
Modules linked in: iptable_filter ip_tables mptctl mptbase bonding 8021q garp stp llc ipv6 microcode power_meter sg be2net(U) serio_raw iTCO_wdt iTCO_vendor_support hpilo hpwdt i7core_edac edac_core shpchp ext4 mbcache jbd2 dm_round_robin sd_mod crc_t10dif lpfc scsi_transport_fc scsi_tgt hpsa(U) dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 668, comm: scsi_eh_2 Not tainted 2.6.32-279.22.1.el6.x86_64 #1 HP ProLiant BL460c G7
RIP: 0010:[<ffffffff81279e90>] [<ffffffff81279e90>] list_del+0x10/0xa0
RSP: 0018:ffff8817e8c07ad0 EFLAGS: 00010282
RAX: dead000000200200 RBX: ffff8817e9b1de00 RCX: 0000000000000035
RDX: 000000000000000d RSI: ffff8817e9666200 RDI: ffff8817e9b1de00
RBP: ffff8817e8c07ae0 R08: ffff8817e8c07b00 R09: 0000000000000000
R10: ffff880028404180 R11: 0000000000000000 R12: ffff8817e8c07b00
R13: 0000000000000000 R14: 000000000000000d R15: 000000000000000e
FS: 0000000000000000(0000) GS:ffff880c36700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fc347a8d600 CR3: 0000000bec06e000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_eh_2 (pid: 668, threadinfo ffff8817e8c06000, task ffff8817ea430080)
Stack:
ffff8817e8c07b00 ffff880be88b8000 ffff8817e8c07b40 ffffffffa00c3616
<d> ffff8817e9b1de00 0000000000000400 ffff8817e96e4800 ffff8817e9666000
<d> ffff8817e8c07b40 ffff880be88b8000 ffff8817e8e40c00 0000000000000000
Call Trace:
[<ffffffffa00c3616>] lpfc_sli4_repost_scsi_sgl_list+0x66/0x160 [lpfc]
[<ffffffffa008cfe1>] lpfc_sli4_hba_setup+0xdd1/0x1d70 [lpfc]
[<ffffffffa00780e3>] ? lpfc_sli_release_iocbq+0x53/0x70 [lpfc]
[<ffffffffa0094887>] ? lpfc_fabric_abort_hba+0x97/0xb0 [lpfc]
[<ffffffffa00809b0>] ? lpfc_sli_abort_iocb_ring+0xd0/0xf0 [lpfc]
[<ffffffffa00b2c13>] ? lpfc_hba_down_post_s4+0x1b3/0x1c0 [lpfc]
[<ffffffffa00afed8>] lpfc_online+0x178/0x1f0 [lpfc]
[<ffffffffa00c154b>] lpfc_host_reset_handler+0x4b/0xb0 [lpfc]
[<ffffffff81359952>] scsi_try_host_reset+0x42/0x120
[<ffffffff8135b30e>] scsi_eh_ready_devs+0x57e/0x840
[<ffffffff8135bce3>] scsi_error_handler+0x503/0x6e0
[<ffffffff8135b7e0>] ? scsi_error_handler+0x0/0x6e0
[<ffffffff81090876>] kthread+0x96/0xa0
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffff810907e0>] ? kthread+0x0/0xa0
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Code: 89 95 fc fe ff ff e9 ab fd ff ff 4c 8b ad e8 fe ff ff e9 db fd ff ff 90 90 90 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 47 08 <4c> 8b 00 4c 39 c7 75 39 48 8b 03 4c 8b 40 08 4c 39 c3 75 4c 48
RIP [<ffffffff81279e90>] list_del+0x10/0xa0
RSP <ffff8817e8c07ad0>
Environment
- Red Hat Enterprise Linux 6.3.z kernels
- seen on kernel 2.6.32-279.22.1.el6
- other kernels 2.6.32-279.14.1.el6 or above may be affected
- 6.3 GA kernel is not affected
- Emulex lpfc driver
- Neither RHEL6.4 nor RHEL5.6 kernels is believed to be vulnerable to this panic. Testing showed the panic was reliably reproduced with a RHEL6.3.z kernel 2.6.32-279.22.1.el6, but unable to be reproduced with a RHEL5.6 or RHEL6.4 kernel.
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
