GPFS deadman timeout triggered panic after FC transport disruption
Issue
System panic after an FC transport disruption followed by Qlogic aborts and finally a GPFS deadman timeout
with I/O in-flight :
Jan 13 11:44:02 somehost kernel: qla2xxx 0000:0e:00.1: scsi(1:0:3): Abort command issued -- 1 de805db 2002.
Jan 13 11:44:03 somehost kernel: qla2xxx 0000:0e:00.1: scsi(1:0:3): Abort command issued -- 1 de805dc 2002.
Jan 13 11:44:47 somehost kernel: GPFS Deadman Switch timer [0] has expired; IOs in progress: 1
The vmcore analysis shows the GFS deadman timer induced the panic :
KERNEL: usr/lib/debug/lib/modules/2.6.18-194.17.1.el5/vmlinux
DUMPFILE: some_vmcore [PARTIAL DUMP]
CPUS: 16
DATE: Fri Jan 13 05:44:12 2012
UPTIME: 92 days, 03:48:27
LOAD AVERAGE: 20.35, 12.70, 10.75
TASKS: 1748
NODENAME: somehost
RELEASE: 2.6.18-194.17.1.el5
VERSION: #1 SMP Mon Sep 20 07:12:06 EDT 2010
MACHINE: x86_64 (2399 Mhz)
MEMORY: 47.3 GB
PANIC: "Kernel panic - not syncing: GPFS Deadman Switch timer has expired, and there are still 1 outstanding I/O requests"
PID: 0
COMMAND: "swapper"
TASK: ffff810c1ff387a0 (1 of 16) [THREAD_INFO: ffff81061fc16000]
CPU: 3
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 0 TASK: ffff810c1ff387a0 CPU: 3 COMMAND: "swapper"
#0 [ffff81061fc1fd38] crash_kexec at ffffffff800ad9ce
#1 [ffff81061fc1fdf8] panic at ffffffff80091a58
#2 [ffff81061fc1fee8] cxiDMSExpired at ffffffff8859dfdf [mmfslinux]
#3 [ffff81061fc1ff08] run_timer_softirq at ffffffff80097aa6
#4 [ffff81061fc1ff58] __do_softirq at ffffffff80012443
#5 [ffff81061fc1ff88] call_softirq at ffffffff8005e2fc
#6 [ffff81061fc1ffa0] do_softirq at ffffffff8006cb8a
#7 [ffff81061fc1ffb0] apic_timer_interrupt at ffffffff8005dc8e
--- <IRQ stack> ---
#8 [ffff81061fc17e08] apic_timer_interrupt at ffffffff8005dc8e
[exception RIP: acpi_processor_idle_simple+0x17d]
RIP: ffffffff8019d581 RSP: ffff81061fc17eb8 RFLAGS: 00000287
RAX: ffff81061fc17fd8 RBX: 00000000005b6338 RCX: 0000000000000908
RDX: 0000000000000908 RSI: 0000000000000003 RDI: 0000000000000000
RBP: 0000000300000000 R8: ffff81061fc16000 R9: 0000000000000034
R10: ffff8106350ead90 R11: ffff810700ee5c80 R12: ffffffff80062ff8
R13: ffff81061fc17ee8 R14: ffff81052c0c5500 R15: 0000000010008040
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#9 [ffff81061fc17eb0] acpi_processor_idle_simple at ffffffff8019d470
#10 [ffff81061fc17ef0] cpu_idle at ffffffff8004923a
Environment
- Red Hat Enterprise Linux 5
- Red Hat Enterprise Linux 6
- Red Hat Enterprise Linux 7
- GPFS (third party)
- Qlogic HBAs with external storage
- lpfc HBA
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.