Soft lockups occurred on one CPU and the kernel crashed due to the subsequent hard LOCKUP occurred on another CPU.

Solution Unverified - Updated -

Issue

  • Soft lockups occurred on one CPU and the kernel crashed due to hard LOCKUP occurred on another CPU subsequently.
[340769.429090] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 23s! [db2sysc:38697]
[340769.429092] Modules linked in: nfnetlink_queue nfnetlink_log bluetooth rfkill nf_conntrack_netlink xt_addrtype br_netfilter overlay(T) xt_CHECKSUM mmfs26(OE) ipt_MASQUERADE nf_nat_masquerade_ipv4 mmfslinux(OE) tracedev(OE) tun bridge stp llc devlink dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi team_mode_loadbalance ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack team ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack libcrc32c ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat sb_edac intel_powerclamp iTCO_wdt
[340769.429136]  iTCO_vendor_support coretemp intel_rapl iosf_mbi kvm ipmi_ssif irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel joydev lrw gf128mul glue_helper ablk_helper cryptd pcspkr hpilo i2c_i801 sg ioatdma lpc_ich hpwdt ipmi_si wmi dm_multipath ipmi_devintf ipmi_msghandler acpi_power_meter binfmt_misc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ixgbe crct10dif_pclmul crct10dif_common crc32c_intel tg3 hpsa mdio dca ptp drm_panel_orientation_quirks scsi_transport_sas pps_core dm_mirror dm_region_hash dm_log dm_mod
[340769.429163] CPU: 19 PID: 38697 Comm: db2sysc Kdump: loaded Tainted: G           OEL ------------ T 3.10.0-1062.1.1.el7.x86_64 #1
[340769.429164] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 07/21/2019
[340769.429166] task: ffff9c71ffab3150 ti: ffff9c6c6ff54000 task.ti: ffff9c6c6ff54000
[340769.429167] RIP: 0010:[<ffffffff8c3150ca>]  [<ffffffff8c3150ca>] smp_call_function_many+0x20a/0x270
[340769.429171] RSP: 0018:ffff9c6c6ff57cf8  EFLAGS: 00000202
[340769.429172] RAX: 0000000000000009 RBX: 000000000001b7c0 RCX: ffff9c74bfa608d0
[340769.429174] RDX: 0000000000000009 RSI: 0000000000000030 RDI: 0000000000000000
[340769.429175] RBP: ffff9c6c6ff57d30 R08: ffff9c75e6d1a400 R09: ffffffff8c57f629
[340769.429177] R10: ffff9cb4bf1df160 R11: ffffe478f3a4ea00 R12: ffff9c6c6ff57ca8
[340769.429178] R13: 0000000000000286 R14: ffffffff8d3d9648 R15: ffffffff8c97cb00
[340769.429180] FS:  00002ac89f7fe700(0000) GS:ffff9cb4bf1c0000(0000) knlGS:0000000000000000
[340769.429182] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[340769.429183] CR2: 00002afe668d9000 CR3: 0000007f6a22e000 CR4: 00000000003607e0
[340769.429185] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[340769.429186] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[340769.429187] Call Trace:
[340769.429190]  [<ffffffff8c27d398>] native_flush_tlb_others+0xb8/0xc0
[340769.429193]  [<ffffffff8c27d40b>] flush_tlb_mm_range+0x6b/0x140
[340769.429196]  [<ffffffff8c3fa5ad>] change_protection+0x5cd/0x670
[340769.429199]  [<ffffffff8c41665b>] change_prot_numa+0x1b/0x40
[340769.429203]  [<ffffffff8c2df1b2>] task_numa_work+0x212/0x370
[340769.429207]  [<ffffffff8c2c1c0b>] task_work_run+0xbb/0xe0
[340769.429210]  [<ffffffff8c22cc65>] do_notify_resume+0xa5/0xc0
[340769.429212]  [<ffffffff8c98d23b>] int_signal+0x12/0x17
[340769.429213] Code: 48 63 35 3e 68 c4 00 89 c2 39 f0 0f 8d 7d fe ff ff 48 98 49 8b 0f 48 03 0c c5 e0 d7 f4 8c f6 41 20 01 74 cd 0f 1f 44 00 00 f3 90 <f6> 41 20 01 75 f8 48 63 35 0d 68 c4 00 eb b7 0f b6 4d cc 4c 89 
...
[340875.954071] Kernel panic - not syncing: Hard LOCKUP
[340875.954103] CPU: 40 PID: 1179 Comm: kworker/40:1 Kdump: loaded Tainted: G           OEL ------------ T 3.10.0-1062.1.1.el7.x86_64 #1
[340875.954147] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 07/21/2019
[340875.954177] Workqueue: events __net_random_once_deferred
[340875.954199] Call Trace:
[340875.954211]  <NMI>  [<ffffffff8c9792c2>] dump_stack+0x19/0x1b
[340875.954240]  [<ffffffff8c972941>] panic+0xe8/0x21f
[340875.954261]  [<ffffffff8c98cd37>] ? ret_from_fork_nospec_begin+0x21/0x21
[340875.954290]  [<ffffffff8c29a5cf>] nmi_panic+0x3f/0x40
[340875.954311]  [<ffffffff8c34d1e1>] watchdog_overflow_callback+0x121/0x140
[340875.954337]  [<ffffffff8c3a68d7>] __perf_event_overflow+0x57/0x100
[340875.954361]  [<ffffffff8c3b0074>] perf_event_overflow+0x14/0x20
[340875.954384]  [<ffffffff8c20ac70>] handle_pmi_common+0x1a0/0x250
[340875.954408]  [<ffffffff8c5819e8>] ? ioremap_page_range+0x2e8/0x480
[340875.954433]  [<ffffffff8c4015a4>] ? vunmap_page_range+0x234/0x470
[340875.954457]  [<ffffffff8c648926>] ? ghes_copy_tofrom_phys+0x116/0x210
[340875.954481]  [<ffffffff8c20af4f>] intel_pmu_handle_irq+0xcf/0x1d0
[340875.955364]  [<ffffffff8c983031>] perf_event_nmi_handler+0x31/0x50
[340875.956156]  [<ffffffff8c98493c>] nmi_handle.isra.0+0x8c/0x150
[340875.956938]  [<ffffffff8c984c18>] do_nmi+0x218/0x460
[340875.957706]  [<ffffffff8c983d9c>] end_repeat_nmi+0x1e/0x81
[340875.958472]  [<ffffffff8c2341c0>] ? setup_data_read+0xd0/0xd0
[340875.959225]  [<ffffffff8c57f629>] ? free_cpumask_var+0x9/0x10
[340875.959972]  [<ffffffff8c3150ce>] ? smp_call_function_many+0x20e/0x270
[340875.960718]  [<ffffffff8c3150ce>] ? smp_call_function_many+0x20e/0x270
[340875.961447]  [<ffffffff8c3150ce>] ? smp_call_function_many+0x20e/0x270
[340875.962162]  <EOE>  [<ffffffff8c951e01>] ? inet6_ehashfn.isra.4+0x21/0x140
[340875.962888]  [<ffffffff8c2341c0>] ? setup_data_read+0xd0/0xd0
[340875.963613]  [<ffffffff8c951e02>] ? inet6_ehashfn.isra.4+0x22/0x140
[340875.964336]  [<ffffffff8c31518d>] on_each_cpu+0x2d/0x60
[340875.965052]  [<ffffffff8c951e01>] ? inet6_ehashfn.isra.4+0x21/0x140
[340875.965767]  [<ffffffff8c2349ea>] text_poke_bp+0x6a/0xf0
[340875.966485]  [<ffffffff8c231608>] arch_jump_label_transform+0x68/0xb0
[340875.967202]  [<ffffffff8c3b8f1f>] __jump_label_update+0x5f/0xa0
[340875.967908]  [<ffffffff8c3b8ffd>] jump_label_update+0x9d/0xb0
[340875.968613]  [<ffffffff8c3b92cd>] __static_key_slow_dec+0x6d/0xb0
[340875.969309]  [<ffffffff8c3b9332>] static_key_slow_dec+0x22/0x50
[340875.970002]  [<ffffffff8c867d13>] __net_random_once_deferred+0x23/0x30
[340875.970687]  [<ffffffff8c2bd0ff>] process_one_work+0x17f/0x440
[340875.971366]  [<ffffffff8c2be216>] worker_thread+0x126/0x3c0
[340875.972036]  [<ffffffff8c2be0f0>] ? manage_workers.isra.26+0x2a0/0x2a0
[340875.972698]  [<ffffffff8c2c50d1>] kthread+0xd1/0xe0
[340875.973347]  [<ffffffff8c2c5000>] ? insert_kthread_work+0x40/0x40
[340875.973996]  [<ffffffff8c98cd37>] ret_from_fork_nospec_begin+0x21/0x21
[340875.974645]  [<ffffffff8c2c5000>] ? insert_kthread_work+0x40/0x40

Environment

  • Red Hat Enterprise Linux 7.7 (kernel-3.10.0-1062.1.1.el7)
  • HPE ProLiant DL380 Gen9
  • Intel(R) Xeon(R) CPU E5-2650 v4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In