GPFS deadman timeout triggered panic after FC transport disruption
Issue
System panic after an FC transport disruption followed by Qlogic aborts and finally a GPFS deadman timeout
with I/O in-flight :
Jan 13 11:44:02 somehost kernel: qla2xxx 0000:0e:00.1: scsi(1:0:3): Abort command issued -- 1 de805db 2002.
Jan 13 11:44:03 somehost kernel: qla2xxx 0000:0e:00.1: scsi(1:0:3): Abort command issued -- 1 de805dc 2002.
Jan 13 11:44:47 somehost kernel: GPFS Deadman Switch timer [0] has expired; IOs in progress: 1
The vmcore analysis shows the GFS deadman timer induced the panic :
KERNEL: usr/lib/debug/lib/modules/2.6.18-194.17.1.el5/vmlinux
DUMPFILE: some_vmcore [PARTIAL DUMP]
CPUS: 16
DATE: Fri Jan 13 05:44:12 2012
UPTIME: 92 days, 03:48:27
LOAD AVERAGE: 20.35, 12.70, 10.75
TASKS: 1748
NODENAME: somehost
RELEASE: 2.6.18-194.17.1.el5
VERSION: #1 SMP Mon Sep 20 07:12:06 EDT 2010
MACHINE: x86_64 (2399 Mhz)
MEMORY: 47.3 GB
PANIC: "Kernel panic - not syncing: GPFS Deadman Switch timer has expired, and there are still 1 outstanding I/O requests"
PID: 0
COMMAND: "swapper"
TASK: ffff810c1ff387a0 (1 of 16) [THREAD_INFO: ffff81061fc16000]
CPU: 3
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 0 TASK: ffff810c1ff387a0 CPU: 3 COMMAND: "swapper"
#0 [ffff81061fc1fd38] crash_kexec at ffffffff800ad9ce
#1 [ffff81061fc1fdf8] panic at ffffffff80091a58
#2 [ffff81061fc1fee8] cxiDMSExpired at ffffffff8859dfdf [mmfslinux]
#3 [ffff81061fc1ff08] run_timer_softirq at ffffffff80097aa6
#4 [ffff81061fc1ff58] __do_softirq at ffffffff80012443
#5 [ffff81061fc1ff88] call_softirq at ffffffff8005e2fc
#6 [ffff81061fc1ffa0] do_softirq at ffffffff8006cb8a
#7 [ffff81061fc1ffb0] apic_timer_interrupt at ffffffff8005dc8e
--- <IRQ stack> ---
#8 [ffff81061fc17e08] apic_timer_interrupt at ffffffff8005dc8e
[exception RIP: acpi_processor_idle_simple+0x17d]
RIP: ffffffff8019d581 RSP: ffff81061fc17eb8 RFLAGS: 00000287
RAX: ffff81061fc17fd8 RBX: 00000000005b6338 RCX: 0000000000000908
RDX: 0000000000000908 RSI: 0000000000000003 RDI: 0000000000000000
RBP: 0000000300000000 R8: ffff81061fc16000 R9: 0000000000000034
R10: ffff8106350ead90 R11: ffff810700ee5c80 R12: ffffffff80062ff8
R13: ffff81061fc17ee8 R14: ffff81052c0c5500 R15: 0000000010008040
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#9 [ffff81061fc17eb0] acpi_processor_idle_simple at ffffffff8019d470
#10 [ffff81061fc17ef0] cpu_idle at ffffffff8004923a
Environment
- RHEL 5
- RHEL 6
- GPFS (third party)
- Qlogic HBAs with external storage
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
