The kernel crashes due to the hard lockup where the tvec_base value is way behind the current jiffies value on all CPUs.

Solution Unverified - Updated -

Issue

  • The kernel crashes due to the hard lockup.
[184510.191400] Watchdog detected hard LOCKUP on cpu 22
[184510.191490] Modules linked in: iptable_filter ip_tables ipmi_watchdog ipmi_devintf nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill ktap_90265(U) oracleacfs(P)(U) oracleadvm(P)(U) oracleoks(P)(U) mptctl mptbase nfs lockd fscache auth_rpcgss nfs_acl sunrpc bonding ipv6 microcode power_meter acpi_ipmi ipmi_si ipmi_msghandler iTCO_wdt iTCO_vendor_support ses enclosure bnx2 serio_raw netxen_nic lpc_ich mfd_core hpilo hpwdt sg i7core_edac edac_core shpchp ext4 jbd2 mbcache dm_round_robin scsi_dh_alua sr_mod cdrom sd_mod lpfc scsi_transport_fc scsi_tgt crc_t10dif pata_acpi ata_generic ata_piix hpsa radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[184510.214445] Pid: 14564, comm: mdnsd.bin Tainted: P           -- ------------    2.6.32-754.39.1.el6.x86_64 #1
[184510.218058] Call Trace:
[184510.221601]  <NMI>  [<ffffffff810f8911>] ? watchdog_overflow_callback+0xf1/0x110
[184510.225272]  [<ffffffff81131e4a>] ? __perf_event_overflow+0xaa/0x240
[184510.228931]  [<ffffffff8101ef88>] ? x86_perf_event_set_period+0xf8/0x180
[184510.232626]  [<ffffffff811324a4>] ? perf_event_overflow+0x14/0x20
[184510.236364]  [<ffffffff810264ec>] ? intel_pmu_handle_irq+0x21c/0x4a0
[184510.240093]  [<ffffffff8156086f>] ? perf_event_nmi_handler+0x3f/0xb0
[184510.243779]  [<ffffffff81562380>] ? notifier_call_chain+0x50/0x80
[184510.247424]  [<ffffffff815623ea>] ? atomic_notifier_call_chain+0x1a/0x20
[184510.251075]  [<ffffffff810b261e>] ? notify_die+0x2e/0x30
[184510.254723]  [<ffffffff8155fed9>] ? do_nmi+0xd9/0x360
[184510.258357]  [<ffffffff8155f7a4>] ? nmi+0x134/0x1a3
[184510.261960]  [<ffffffff8155def2>] ? _spin_lock_irqsave+0x32/0x40
[184510.265601]  <<EOE>>  [<ffffffff810b0861>] ? lock_hrtimer_base+0x31/0x60
[184510.269334]  [<ffffffff810b1532>] ? hrtimer_try_to_cancel+0x22/0xd0
[184510.273064]  [<ffffffff810b1602>] ? hrtimer_cancel+0x22/0x30
[184510.276735]  [<ffffffff8155d0a0>] ? schedule_hrtimeout_range+0xd0/0x160
[184510.280420]  [<ffffffff810b04c0>] ? hrtimer_wakeup+0x0/0x30
[184510.284078]  [<ffffffff810b1504>] ? hrtimer_start_range_ns+0x14/0x20
[184510.287732]  [<ffffffff811baad9>] ? poll_schedule_timeout+0x39/0x60
[184510.291370]  [<ffffffff811bb3f5>] ? do_select+0x5d5/0x7c0
[184510.295023]  [<ffffffff811babd0>] ? __pollwait+0x0/0xf0
[184510.298703]  [<ffffffff811bacc0>] ? pollwake+0x0/0x60
[184510.302348]  [<ffffffff811bacc0>] ? pollwake+0x0/0x60
[184510.305914]  [<ffffffff811bacc0>] ? pollwake+0x0/0x60
[184510.309424]  [<ffffffff811bacc0>] ? pollwake+0x0/0x60
[184510.312905]  [<ffffffff811bacc0>] ? pollwake+0x0/0x60
[184510.316341]  [<ffffffff811bacc0>] ? pollwake+0x0/0x60
[184510.319718]  [<ffffffff811bacc0>] ? pollwake+0x0/0x60
[184510.323077]  [<ffffffff811bacc0>] ? pollwake+0x0/0x60
[184510.326434]  [<ffffffff811bacc0>] ? pollwake+0x0/0x60
[184510.329730]  [<ffffffff811bc02a>] ? core_sys_select+0x18a/0x2c0
[184510.333006]  [<ffffffff81474dbc>] ? __sys_recvmsg+0x16c/0x2e0
[184510.336265]  [<ffffffff81013539>] ? read_tsc+0x9/0x20
[184510.339494]  [<ffffffff810b8394>] ? ktime_get_ts+0xc4/0x100
[184510.342700]  [<ffffffff81013539>] ? read_tsc+0x9/0x20
[184510.345880]  [<ffffffff810b8394>] ? ktime_get_ts+0xc4/0x100
[184510.349077]  [<ffffffff811bc3b7>] ? sys_select+0x47/0x110
[184510.352309]  [<ffffffff81566391>] ? system_call_fastpath+0x1f/0x3a
  • The task on the CPU is looping in the spinlock in kernel mode without giving other tasks a chance to run for more than 60 seconds.
PID: 14564  TASK: ffff88386032d520  CPU: 22  COMMAND: "mdnsd.bin"
    ...
    [exception RIP: _spin_lock_irqsave+50]
    RIP: ffffffff8155def2  RSP: ffff8838634ab808  RFLAGS: 00000097
    RAX: 000000000000fc5a  RBX: ffff8820b8915ac8  RCX: 000000000000fc59
    RDX: 0000000000000282  RSI: ffff8838634ab860  RDI: ffff8820b8915a80
    RBP: ffff8838634ab808   R8: ffff8838634a8000   R9: 00000000ffffffff
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff8838634ab8a8
    R13: ffff8838634ab860  R14: 00000000004c4b3f  R15: 0000000000000040
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
#12 [ffff8838634ab808] _spin_lock_irqsave at ffffffff8155def2
#13 [ffff8838634ab810] lock_hrtimer_base at ffffffff810b0861
#14 [ffff8838634ab840] hrtimer_try_to_cancel at ffffffff810b1532
    ...
  • At this time, another task on another CPU is running the CPU idle routine with that spinlock being held.
PID: 0      TASK: ffff8810693a0ab0  CPU: 24  COMMAND: "swapper"
    ...
    [exception RIP: mwait_idle_with_hints+196]
    RIP: ffffffff810156d4  RSP: ffff8810693abdf8  RFLAGS: 00000046
    RAX: 0000000000000010  RBX: 0000000000000010  RCX: 0000000000000001
    RDX: 0000000000000000  RSI: ffff8810693abfd8  RDI: 0000000000000010
    RBP: ffff8810693abe28   R8: 0000000000000003   R9: 0000000000000040
    R10: 0000a624f52a6af8  R11: 0000000000000000  R12: 0000000000000001
    R13: 0000000000000018  R14: 0000000000000002  R15: 16a331373037b949
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #3 [ffff8810693abdf8] mwait_idle_with_hints at ffffffff810156d4
 #4 [ffff8810693abe60] hrtimer_try_to_cancel at ffffffff810b15ab
 #5 [ffff8810693abea0] hrtimer_cancel at ffffffff810b1602
 #6 [ffff8810693abec0] tick_nohz_restart_sched_tick at ffffffff810beb7b
 #7 [ffff8810693abef0] cpu_idle at ffffffff8100a1ad

Environment

  • Red Hat Enterprise Linux 6.10.z kernel-2.6.32-754.39.1.el6
  • HPE ProLiant DL580 G7

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content