Hard lockups occurred on multiple CPUs. Looks like the timer_jiffies were not being updated properly due to a possibile faulty hardware.

Solution Unverified - Updated -

Issue

  • Hard lockups occurred on multiple CPUs. Looks like the tvec_bases were not being updated properly.
crash> log | grep LOCKUP\ on
[4335923.922088] NMI watchdog: Watchdog detected hard LOCKUP on cpu 45
[4335923.922090] NMI watchdog: Watchdog detected hard LOCKUP on cpu 91
[4335923.922092] NMI watchdog: Watchdog detected hard LOCKUP on cpu 96
[4335923.922095] NMI watchdog: Watchdog detected hard LOCKUP on cpu 87
[4335923.922096] NMI watchdog: Watchdog detected hard LOCKUP on cpu 60
[4335923.922098] NMI watchdog: Watchdog detected hard LOCKUP on cpu 70
[4335923.922100] NMI watchdog: Watchdog detected hard LOCKUP on cpu 71
[4335923.922101] NMI watchdog: Watchdog detected hard LOCKUP on cpu 59
  • All of the CPUs that can be seen with the hard lockup messages logged in kernel ring buffer were just running the default idle routine during the time frame when those hard lockups were encountered.
crash> bt -c 45,91,96,87,60,70,71,59 | grep "exception RIP:"
    [exception RIP: native_safe_halt+11]
    [exception RIP: native_safe_halt+11]
    [exception RIP: native_safe_halt+11]
    [exception RIP: native_safe_halt+11]
    [exception RIP: native_safe_halt+11]
    [exception RIP: native_safe_halt+11]
    [exception RIP: native_safe_halt+11]
    [exception RIP: native_safe_halt+11]
  • With backtraces just like this:
PID: 0      TASK: ffff9032c703d280  CPU: 45  COMMAND: "swapper/45"
    ...
    [exception RIP: native_safe_halt+11]
    RIP: ffffffffa878b25b  RSP: ffff9032c7053ea8  RFLAGS: 00000246
    RAX: ffffffffa878b010  RBX: ffffffffa8d5e000  RCX: 0000000000000048
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000046
    RBP: ffff9032c7053ea8   R8: 0000000000000000   R9: 0000000000000001
    R10: 0000000000000000  R11: 7fffffffffffffff  R12: 000000000000002d
    R13: ffff9032c7050000  R14: ffff9032c7050000  R15: ffff9032c7050000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
#12 [ffff9032c7053ea8] native_safe_halt at ffffffffa878b25b
#13 [ffff9032c7053eb0] default_idle at ffffffffa878b02e
#14 [ffff9032c7053ed0] arch_cpu_idle at ffffffffa8037ca0
#15 [ffff9032c7053ee0] cpu_startup_entry at ffffffffa810181a
#16 [ffff9032c7053f28] start_secondary at ffffffffa805a827
#17 [ffff9032c7053f50] start_cpu at ffffffffa80000d5

Environment

  • Red Hat Enterprise Linux 7.9.z: kernel-3.10.0-1160.42.2.el7
  • Cisco UCSC-C240-M5L

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content