GPFS deadman timeout triggered panic after FC transport disruption

Solution Verified - Updated -

Issue

System panic after an FC transport disruption followed by Qlogic aborts and finally a GPFS deadman timeout
with I/O in-flight :

Jan 13 11:44:02 somehost kernel: qla2xxx 0000:0e:00.1: scsi(1:0:3): Abort command issued -- 1 de805db 2002.
Jan 13 11:44:03 somehost kernel: qla2xxx 0000:0e:00.1: scsi(1:0:3): Abort command issued -- 1 de805dc 2002.
Jan 13 11:44:47 somehost kernel: GPFS Deadman Switch timer [0] has expired; IOs in progress: 1

The vmcore analysis shows the GFS deadman timer induced the panic :

      KERNEL: usr/lib/debug/lib/modules/2.6.18-194.17.1.el5/vmlinux
    DUMPFILE: some_vmcore  [PARTIAL DUMP]
        CPUS: 16
        DATE: Fri Jan 13 05:44:12 2012
      UPTIME: 92 days, 03:48:27
LOAD AVERAGE: 20.35, 12.70, 10.75
       TASKS: 1748
    NODENAME: somehost
     RELEASE: 2.6.18-194.17.1.el5
     VERSION: #1 SMP Mon Sep 20 07:12:06 EDT 2010
     MACHINE: x86_64  (2399 Mhz)
      MEMORY: 47.3 GB
       PANIC: "Kernel panic - not syncing: GPFS Deadman Switch timer has expired, and there are still 1 outstanding I/O requests"
         PID: 0
     COMMAND: "swapper"
        TASK: ffff810c1ff387a0  (1 of 16)  [THREAD_INFO: ffff81061fc16000]
         CPU: 3
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 0      TASK: ffff810c1ff387a0  CPU: 3   COMMAND: "swapper"
 #0 [ffff81061fc1fd38] crash_kexec at ffffffff800ad9ce
 #1 [ffff81061fc1fdf8] panic at ffffffff80091a58
 #2 [ffff81061fc1fee8] cxiDMSExpired at ffffffff8859dfdf [mmfslinux]
 #3 [ffff81061fc1ff08] run_timer_softirq at ffffffff80097aa6
 #4 [ffff81061fc1ff58] __do_softirq at ffffffff80012443
 #5 [ffff81061fc1ff88] call_softirq at ffffffff8005e2fc
 #6 [ffff81061fc1ffa0] do_softirq at ffffffff8006cb8a
 #7 [ffff81061fc1ffb0] apic_timer_interrupt at ffffffff8005dc8e
--- <IRQ stack> ---
 #8 [ffff81061fc17e08] apic_timer_interrupt at ffffffff8005dc8e
    [exception RIP: acpi_processor_idle_simple+0x17d]
    RIP: ffffffff8019d581  RSP: ffff81061fc17eb8  RFLAGS: 00000287
    RAX: ffff81061fc17fd8  RBX: 00000000005b6338  RCX: 0000000000000908
    RDX: 0000000000000908  RSI: 0000000000000003  RDI: 0000000000000000
    RBP: 0000000300000000   R8: ffff81061fc16000   R9: 0000000000000034
    R10: ffff8106350ead90  R11: ffff810700ee5c80  R12: ffffffff80062ff8
    R13: ffff81061fc17ee8  R14: ffff81052c0c5500  R15: 0000000010008040
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
 #9 [ffff81061fc17eb0] acpi_processor_idle_simple at ffffffff8019d470
#10 [ffff81061fc17ef0] cpu_idle at ffffffff8004923a

Environment

  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7
  • GPFS (third party)
  • Qlogic HBAs with external storage
  • lpfc HBA

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content