Red Hat Enterprise Linux 7.9 kernel panics with hard lockup when running Lustre

Solution Verified - Updated -

Issue

  • Kernel panic with hard lockups in Red Hat Enterprise Linux 7.9 and Lustre
  • Backtraces look similar to the following (Note specific details may vary such as the CPU and command)

    • Possible backtrace

      Kernel panic - not syncing: Hard LOCKUP
      CPU: 14 PID: 0 Comm: swapper/14 Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-1160.11.1.1chaos.ch6.x86_64 #1
      ...
      Call Trace:
       <NMI>  [<ffffffffa47ae072>] dump_stack+0x19/0x1b
       [<ffffffffa47a71e7>] panic+0xe8/0x21f
      ...
       [<ffffffffa40b1edc>] ? run_timer_softirq+0xbc/0x370
       <EOE>  <IRQ>  [<ffffffffa40a82fd>] __do_softirq+0xfd/0x2c0
       [<ffffffffa47c56ec>] call_softirq+0x1c/0x30
       [<ffffffffa4030995>] do_softirq+0x65/0xa0
       [<ffffffffa40a86d5>] irq_exit+0x105/0x110
       [<ffffffffa47c6c88>] smp_apic_timer_interrupt+0x48/0x60
       [<ffffffffa47c31ba>] apic_timer_interrupt+0x16a/0x170
       <EOI>  [<ffffffffa40b3113>] ? get_next_timer_interrupt+0x103/0x270
       [<ffffffffa45eace7>] ? cpuidle_enter_state+0x57/0xd0
       [<ffffffffa45eae3e>] cpuidle_idle_call+0xde/0x270
       [<ffffffffa403919e>] arch_cpu_idle+0xe/0xc0
       [<ffffffffa410856a>] cpu_startup_entry+0x14a/0x1e0
       [<ffffffffa405cbb7>] start_secondary+0x207/0x280
       [<ffffffffa40000d5>] start_cpu+0x5/0x14
      
    • Another possible backtrace

      Call Trace:
       <NMI>  [<ffffffff85fae072>] dump_stack+0x19/0x1b
       [<ffffffff85fa71e7>] panic+0xe8/0x21f
      ...
       [<ffffffff8591f4e8>] ? native_queued_spin_lock_slowpath+0x158/0x200
       <EOE>  [<ffffffff85fa7dd2>] queued_spin_lock_slowpath+0xb/0xf
       [<ffffffff85fb7197>] _raw_spin_lock_irqsave+0x47/0x50
       [<ffffffff858b1b8b>] lock_timer_base.isra.38+0x2b/0x50
       [<ffffffff858b244f>] try_to_del_timer_sync+0x2f/0x90
       [<ffffffff858b2502>] del_timer_sync+0x52/0x60
       [<ffffffff85fb1920>] schedule_timeout+0x180/0x320
       [<ffffffff858b1870>] ? requeue_timers+0x1f0/0x1f0
      

Environment

  • Red Hat Entrerpise Linux 7.9

    • Specifically with kernel versions between kernel-3.10.0-1160.11.1.el7 and kernel-3.10.0-1160.21.1.el7
  • Lustre

    • The issue seems more prominent on Lustre OSTs

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content