Unexplained behaviour of RHEL systems with Skylake CPUs
Environment
- Red Hat Enterprise Linux
- Synergy 480 Gen10 05/22/2019 BIOS Rev: 2.10
- Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz (fam: 06, model: 55, stepping: 04)
- microcode: sig=0x50654, pf=0x80, revision=0x200005e
- ThinkSystem SR950 07/03/2019 BIOS Rev: 1.53(2.34)
- Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz (fam: 06, model: 55, stepping: 04)
- microcode: sig=0x50654, pf=0x80, revision=0x200005e
- Eslim SU7-2212 Purley 06/26/2018
- Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz, revision=0x200005e
- Inspur Prod: NF5280M5 Vers: 4.1.8 Date: 05/21/2019 BIOS Rev: 5.14
- Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz (fam: 06, model: 55, stepping: 04)
- microcode: CPU0 sig=0x50654, pf=0x80, revision=0x200005e
- HPE ProLiant DL580 Gen10/ProLiant DL580 Gen10, BIOS U34 03/09/2020 BIOS Rev: 2.32
- Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz (fam: 06, model: 55, stepping: 04)
- microcode: sig=0x50654, pf=0x80, revision=0x2006906
- Intel Corporation S2600BPB/S2600BPB, BIOS SE5C620.86B.00.01.0013.030920180427 03/09/2018
- Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (fam: 06, model: 55, stepping: 04)
- microcode: sig=0x50654, pf=0x80, revision=0x200004d
- Lenovo ThinkSystem SR630 -[7X02CTO1WW]-/-[7X02CTO1WW]-, BIOS -[IVE660P-2.72]- 09/28/2020
- Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz (fam: 06, model: 55, stepping: 07)
- microcode: sig=0x50657, pf=0x80, revision=0x5002f01
- Dell Inc. PowerEdge R740xd/0YNX56, BIOS 2.15.1 06/15/2022
- Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz (fam: 06, model: 55, stepping: 04)
- microcode: sig=0x50654, pf=0x80, revision=0x2006e05
Issue
- Kernel crashes with below logs:
[49082.161931] mm/memory.c:413: bad pmd ffff8a930c69b230(0000000200000000)
[49082.168625] mm/memory.c:413: bad pmd ffff8a931db33770(0000000200000000)
[49082.175326] BUG: unable to handle kernel paging request at 00000afffffd3f60
[49082.182346] IP: [<ffffffff9897ce5c>] __radix_tree_lookup+0x2c/0xf0
[49082.188572] PGD 0
[49082.190604] Oops: 0000 [#1] SMP
[49082.193870] Modules linked in: udp_diag unix_diag af_packet_diag netlink_diag tcp_diag inet_diag ebtable_filt
er ebtables devlink overlay(T) scini(POE) 8021q garp mrp bonding vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat nls_utf8 isofs nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw ip6_tables iptable_raw nf_log_ipv4 nf_log_common xt_LOG ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_multiport xt_conntrack iptable_filter skx_edac nfit libnvdimm intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass pcspkr ses enclosure sg joydev mei_me mei lpc_ich hpilo hpwdt ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nf_conntrack br_netfilter bridge stp llc ip_tables xfs dm_service_time sd_mod crc_t10dif crct10dif_generic
[49082.304541] CPU: 61 PID: 338389 Comm: crond Kdump: loaded Tainted: P OE ------------ T 3.10.0-957.21.3.el7.x86_64 #1
[49082.316079] Hardware name: HPE Synergy 480 Gen10/Synergy 480 Gen10 Compute Module, BIOS I42 05/22/2019
[49082.325434] task: ffff8a93a3772080 ti: ffff8a939d41c000 task.ti: ffff8a939d41c000
[49082.332957] RIP: 0010:[<ffffffff9897ce5c>] [<ffffffff9897ce5c>] __radix_tree_lookup+0x2c/0xf0
[49082.341627] RSP: 0000:ffff8a939d41fcd0 EFLAGS: 00010246
[49082.346964] RAX: 0000000000000000 RBX: 00000afffffd3f58 RCX: ffff8a939d41fd18
[49082.354136] RDX: 0000000000000000 RSI: 0003fffffeffffff RDI: 00000afffffd3f58
[49082.361307] RBP: ffff8a939d41fd08 R08: ffff8a930bf35e70 R09: 0000000000000000
[49082.368480] R10: 00007fabc99c4b10 R11: 00003ffffffff000 R12: 0003fffffeffffff
[49082.375653] R13: 00000afffffd3f58 R14: ffff8a939d41fd18 R15: 0000000000000000
[49082.382826] FS: 00007fabc99c4840(0000) GS:ffff8ab7afb40000(0000) knlGS:0000000000000000
[49082.390959] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[49082.396734] CR2: 00000afffffd3f60 CR3: 000000231fa90000 CR4: 00000000007607e0
[49082.403915] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[49082.411086] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[49082.418258] PKRU: 55555554
[49082.420977] Call Trace:
[49082.423436] [<ffffffff9897cf42>] radix_tree_lookup_slot+0x22/0x50
[49082.429650] [<ffffffff987b600e>] __find_get_page+0x1e/0xa0
[49082.435251] [<ffffffff987b609e>] find_get_page+0xe/0x20
[49082.440592] [<ffffffff987ff8b7>] lookup_swap_cache+0x37/0x50
[49082.446368] [<ffffffff987e969d>] handle_pte_fault+0x24d/0xd10
[49082.452233] [<ffffffff9889e5aa>] ? __posix_lock_file+0x21a/0x550
[49082.458358] [<ffffffff987ec27d>] handle_mm_fault+0x39d/0x9b0
[49082.464134] [<ffffffff98d70603>] __do_page_fault+0x203/0x4f0
[49082.469910] [<ffffffff98d70925>] do_page_fault+0x35/0x90
[49082.475337] [<ffffffff98d6c768>] page_fault+0x28/0x30
[49082.481174] Code: 48 89 e5 41 57 49 89 d7 41 56 49 89 ce 41 55 49 89 fd 41 54 49 89 f4 53 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 <49> 8b 55 08 48 89 d6 48 89 55 c8 83 e6 03 48 83 fe 01 0f 85 a4
[49082.502134] RIP [<ffffffff9897ce5c>] __radix_tree_lookup+0x2c/0xf0
[49082.509106] RSP <ffff8a939d41fcd0>
[49082.513266] CR2: 00000afffffd3f60
- Another pattern of kernel crash:
[138468.326624] BUG: unable to handle kernel paging request at 0000000200000018
[138468.334526] IP: [<ffffffff8f96b74c>] _raw_spin_lock+0xc/0x30
[138468.341092] PGD 0
[138468.343960] Oops: 0002 [#1] SMP
[138468.348049] Modules linked in: vhost_net vhost macvtap macvlan tun xt_CT xt_mac xt_sctp xt_physdev veth udp_diag
unix_diag af_packet_diag netlink_diag tcp_diag inet_diag ebtable_filter ebtables devlink overlay(T) scini(POE) 8021
q garp mrp bonding vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat nls_utf8 isofs nf_log_
ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw ip6_tables iptable_raw nf_log_ipv4 nf_log_common
xt_LOG ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment xt_multiport xt_conntrack iptable_filte
r skx_edac nfit libnvdimm intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass pcspkr ses enclosure
sg joydev mei_me lpc_ich mei hpilo hpwdt ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nf_conntrack br_netf
ilter
[138468.473671] CPU: 82 PID: 0 Comm: swapper/82 Kdump: loaded Tainted: P OE ------------ T 3.10.0-957.21.
3.el7.x86_64 #1
[138468.486873] Hardware name: HPE Synergy 480 Gen10/Synergy 480 Gen10 Compute Module, BIOS I42 05/22/2019
[138468.497110] task: ffff8db2036a30c0 ti: ffff8db2036b0000 task.ti: ffff8db2036b0000
[138468.505517] RIP: 0010:[<ffffffff8f96b74c>] [<ffffffff8f96b74c>] _raw_spin_lock+0xc/0x30
[138468.514552] RSP: 0018:ffff8dd52fd83960 EFLAGS: 00010246
[138468.520780] RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
[138468.528841] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000200000018
[138468.536896] RBP: ffff8dd52fd839a8 R08: ffff8db121ed5a00 R09: ffff8d85a1b95050
[138468.544949] R10: ffff8dd52fd83898 R11: 000000000000e513 R12: 000000000000010c
[138468.553003] R13: 0000000000000052 R14: ffff8d85a1b94ec0 R15: 0000000200000018
[138468.561055] FS: 0000000000000000(0000) GS:ffff8dd52fd80000(0000) knlGS:0000000000000000
[138468.570074] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[138468.576733] CR2: 0000000200000018 CR3: 0000001456610000 CR4: 00000000007627e0
[138468.584775] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[138468.592808] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[138468.600830] PKRU: 00000000
[138468.604422] Call Trace:
[138468.607709] <IRQ>
[138468.609737] [<ffffffffc05bdc27>] ? ovs_flow_stats_update+0x57/0x160 [openvswitch]
[138468.619078] [<ffffffffc05c4e04>] ? ovs_flow_tbl_lookup_stats+0x84/0xc0 [openvswitch]
[138468.627759] [<ffffffffc05bc7cf>] ovs_dp_process_packet+0x6f/0x120 [openvswitch]
[138468.635999] [<ffffffffc05c7334>] ? ovs_ct_update_key+0xc4/0x150 [openvswitch]
[138468.644053] [<ffffffffc05c6193>] ovs_vport_receive+0x73/0xd0 [openvswitch]
[138468.651864] [<ffffffffc0519e1b>] ? apic_timer_fn+0x1b/0x70 [kvm]
[138468.658771] [<ffffffffc05c6c3e>] netdev_frame_hook+0xde/0x180 [openvswitch]
[138468.666629] [<ffffffff8f83a04a>] __netif_receive_skb_core+0x1fa/0xa10
[138468.673956] [<ffffffff8f8b7eb3>] ? udp4_gro_receive+0x1c3/0x2a0
[138468.680753] [<ffffffff8f419a00>] ? kmalloc_order_trace+0x10/0xa0
[138468.687628] [<ffffffff8f83a878>] __netif_receive_skb+0x18/0x60
[138468.694317] [<ffffffff8f83a900>] netif_receive_skb_internal+0x40/0xc0
[138468.701608] [<ffffffff8f83b588>] napi_gro_receive+0xd8/0x100
[138468.708124] [<ffffffffc13095e0>] bnx2x_rx_int+0xa70/0x1950 [bnx2x]
[138468.715145] [<ffffffff8f2db978>] ? __enqueue_entity+0x78/0x80
[138468.721722] [<ffffffff8f83854d>] ? __dev_kfree_skb_any+0x3d/0x50
[138468.728572] [<ffffffffc1308898>] ? bnx2x_free_tx_pkt+0x218/0x300 [bnx2x]
[138468.736119] [<ffffffffc130c40d>] bnx2x_poll+0x1dd/0x260 [bnx2x]
[138468.742863] [<ffffffff8f83af1f>] net_rx_action+0x26f/0x390
[138468.749159] [<ffffffff8f2a1075>] __do_softirq+0xf5/0x280
[138468.755263] [<ffffffff8f97932c>] call_softirq+0x1c/0x30
[138468.761262] [<ffffffff8f22e675>] do_softirq+0x65/0xa0
[138468.767072] [<ffffffff8f2a13f5>] irq_exit+0x105/0x110
[138468.772864] [<ffffffff8f97a606>] do_IRQ+0x56/0xf0
[138468.778293] [<ffffffff8f96c362>] common_interrupt+0x162/0x162
[138468.784763] <EOI>
[138468.786788] [<ffffffff8f7aef07>] ? cpuidle_enter_state+0x57/0xd0
[138468.794172] [<ffffffff8f7af05e>] cpuidle_idle_call+0xde/0x230
[138468.800596] [<ffffffff8f2366de>] arch_cpu_idle+0xe/0xc0
[138468.806482] [<ffffffff8f2fc6da>] cpu_startup_entry+0x14a/0x1e0
[138468.812971] [<ffffffff8f258047>] start_secondary+0x1f7/0x270
[138468.819272] [<ffffffff8f2000d5>] start_cpu+0x5/0x14
[138468.824777] Code: 5d c3 0f 1f 44 00 00 85 d2 74 e4 0f 1f 40 00 eb ed 66 0f 1f 44 00 00 b8 01 00 00 00 5d c3 90 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 01 c3 55 89 c6 48 89 e5 e8 20 1b ff ff 5d
[138468.845377] RIP [<ffffffff8f96b74c>] _raw_spin_lock+0xc/0x30
[138468.851699] RSP <ffff8dd52fd83960>
[138468.855741] CR2: 0000000200000018
- Another pattern:
[2593265.569184] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 13.125 msecs
[2593265.569817] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 23.748 msecs
[2593265.572319] INFO: NMI handler (ghes_notify_nmi) took too long to run: 1781.055 msecs
[2593265.572333] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1804.183 msecs
[2593265.572940] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 1804.185 msecs
[2593265.576056] sched: RT throttling activated
[2593265.577932] hrtimer: interrupt took 1794008697 ns
[2593265.582328] perf: interrupt took too long (14058867 > 180000), lowering kernel.perf_event_max_sample_rate to 1000
[2593283.992584] NMI watchdog: Watchdog detected hard LOCKUP on cpu 51
[2593283.992589] NMI watchdog: Watchdog detected hard LOCKUP on cpu 41
[2593284.002545] INFO: NMI handler (ghes_notify_nmi) took too long to run: 2121.158 msecs
[2593285.027143] Hardware name: Lenovo ThinkSystem SR950 -[7X13CTO1WW]-/-[7X13CTO1WW]-, BIOS -[PSE122R-1.53]- 07/03/2019
- Another pattern:
[618088.801338] BUG: unable to handle kernel paging request at 000000000009a000
[618088.820849] IP: [< (null)>] (null)
[618088.840388] PGD 0
[618088.859907] Oops: 0010 [#1] SMP
[618089.155718] CPU: 70 PID: 0 Comm: swapper/70 Kdump: loaded Tainted: G ------------ T 3.10.0-1127.18.2.el7.x86_64 #1
[618089.195815] Hardware name: HPE ProLiant DL580 Gen10/ProLiant DL580 Gen10, BIOS U34 03/09/2020
[618089.216874] task: ffff8dfe2dfa1070 ti: ffff8dfe2dfac000 task.ti: ffff8dfe2dfac000
[618089.236460] RIP: 9a00:[<0000000000000000>] [< (null)>] (null)
[618089.256158] RSP: 0018:ffff8dfe2dfafe20 EFLAGS: 00010046
[618089.275241] RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
[618089.294351] RDX: 0000000000000000 RSI: ffff8dfe2dfaffd8 RDI: 000000225c610000
[618089.313103] RBP: ffff8dfe2dfafe50 R08: 000000000029d92e R09: 0000000000000018
[618089.332016] R10: 00000000000bbc17 R11: 7fffffffffffffff R12: 0000000000000003
[618089.350261] R13: 0000000000000020 R14: ffff8dfe2dfaffd8 R15: ffffffffaf2de7a0
[618089.368246] FS: 0000000000000000(0000) GS:ffff8e098fd80000(0000) knlGS:0000000000000000
[618089.388737] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[618089.406451] CR2: 000000000009a000 CR3: 000000225c610000 CR4: 00000000007607e0
[618089.424016] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[618089.441143] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[618089.457922] PKRU: 00000000
[618089.474314] Call Trace:
[618089.490603] [<ffffffffaebc61d5>] cpuidle_enter_state+0x45/0xd0
[618089.506844] [<ffffffffaebc633e>] cpuidle_idle_call+0xde/0x230
[618089.522743] [<ffffffffae637c8e>] arch_cpu_idle+0xe/0xc0
[618089.538260] [<ffffffffae701c7a>] cpu_startup_entry+0x14a/0x1e0
[618089.553428] [<ffffffffae65a727>] start_secondary+0x1f7/0x270
[618089.568330] [<ffffffffae6000d5>] start_cpu+0x5/0x14
[618089.585610] Code: Bad RIP value.
[618089.600157] RIP [< (null)>] (null)
[618089.614387] RSP <ffff8dfe2dfafe20>
[618089.628178] CR2: 000000000009a000
Resolution
- Please contact H/W vendor for further assistance with bios update.
- One of customers has confirmed that CPU replacement with microcode update to
0x2000065
, issue has been resolved.
Before CPU replacement:
DMI: HPE Synergy 480 Gen10/Synergy 480 Gen10 Compute Module, BIOS I42 05/22/2019
smpboot: CPU0: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz (fam: 06, model: 55, stepping: 04)
microcode: sig=0x50654, pf=0x80, revision=0x200005e 96 logical processors (48 CPU cores)
After CPU replacement:
DMI: HPE Synergy 480 Gen10/Synergy 480 Gen10 Compute Module, BIOS I42 11/13/2019
smpboot: CPU0: Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz (fam: 06, model: 55, stepping: 04)
microcode: sig=0x50654, pf=0x80, revision=0x2000065 96 logical processors (48 CPU cores)
Note: If updating microcode via newer microcode_ctl package from RHEL, then one should be aware of issues with a later version of microcode for cpu 06-55-04.
Root Cause
- Intel® Xeon® Processor Scalable Family Specification Update 2022 September
- <hhttps://cdrdv2-public.intel.com/613537/613537_Intel%C2%AE%20Xeon%C2%AE%20Processor%20Scalable%20Family%20Specification%20Update_Rev032US.pdf>
SKX8. Intel® CAT/CDP Might Not Restrict Cacheline Allocation Under Certain Conditions
(Intel® Xeon® Processor Scalable Family)
Problem: Under certain microarchitectural conditions involving heavy memory traffic, cache lines
might fill outside the allocated L3 capacity bitmask (CBM) associated with the current
Class of Service (CLOS).
Implication: Cache Allocation Technology/Code and Data Prioritization (CAT/CDP) might see performance
side effects and a reduction in the effectiveness of the CAT feature for certain classes
of applications, including cache-sensitive workloads than seen on previous platforms.
Workaround: None identified. Contact your Intel representative for details of possible mitigations.
None identified.
Status: No Fix.
SKX18. Intel® CAT Might Not Restrict Cacheline Allocation Under Certain Conditions
(Intel® Xeon® Processor Scalable Family)
Problem: Under certain micro-architectural conditions involving heavy memory traffic, cachelines
might fill outside the allocated L3 capacity bit-mask (CBM) associated with the current
Class of Service (CLOS).
Implication: CAT might appear less effective at protecting certain classes of applications, including
cache-sensitive workloads than on previous platforms.
Workaround: None identified. Contact your Intel representative for details of possible mitigations.
Status: No Fix.
SKX40. Masked Bytes in a Vector Masked Store Instructions May Cause Write Back of a Cache Line
Problem: Vector masked store instructions to WB (write-back) memory-type that cross cache lines
may lead to CPU writing back cached data even for cache lines where all of the bytes are
masked.
Implication: The processor may generate writes of un-modified data. This can affect Memory Mapped I/O
(MMIO) or non-coherent agents in the following ways:
1. For MMIO range that is mapped as WB memory type, this erratum may lead to Machine
Check Exception (MCE) due to writing back data into the MMIO space. This applies
only to cross page vector masked stores where one of the pages is in MMIO range.
2. If the CPU cached data is stale, for example in the case of memory written directly
by a non-coherent agent (agent that uses non-coherent writes), this erratum may lead
to writing back stale cached data even if these bytes are masked.
Workaround: Platforms should not map MMIO memory space or non-coherent device memory space as WB
memory. If WB is used for MMIO range, software or VMM should not map such MMIO page
adjacent to a regular WB page (adjacent on the linear address space, before or after the
I/O page). Memory that may be written by non-coherent agents should be separated by at
least 64 bytes from regular memory used for other purposes (on the linear address space).
Status: No Fix.
SKX59. Vector Masked Store Instructions May Cause Write Back of Cache Line Where Bytes Are
Masked
Problem: Vector masked store instructions to write-back (WB) memory-type that cross cache lines
may lead to CPU writing back cached data even for cache lines where all of the bytes are
masked. This can affect Memory Mapped I/O (MMIO) or non-coherent agents in the following
ways:
1. For MMIO range that is mapped as WB memory type, this erratum may lead to Machine
Check Exception (MCE) due to writing back data into the MMIO space. This applies only
to cross page vector masked stores where one of the pages is in MMIO range.
2. If the CPU cached data is stale, for example in the case of memory written directly
by a non-coherent agent (agent that uses non-coherent writes), this erratum may lead
to writing back stale cached data even if these bytes are masked.
Implication: CPU may generate writes into MMIO space which lead to MCE, or may write stale data into
memory also written by non-coherent agents.
Workaround: It is recommended not to map MMIO range as WB. If WB is used for MMIO range, OS or VMM
should not map such MMIO page adjacent to a regular WB page (adjacent on the linear
address space, before or after the I/O page). Memory that may be written by non-coherent
agents should be separated by at least 64 bytes from regular memory used for other
purposes (on the linear address space).
Status: No Fix.
SKX61. MOVNTDQA From WC Memory May Pass Earlier Locked Instructions
Problem: An execution of (V)MOVNTDQA (streaming load instruction) that loads from Write Combining
(WC) memory may appear to pass an earlier locked instruction to a different cache line.
Implication: Software that expects a lock to fence subsequent (V)MOVNTDQA instructions may not operate
properly.
Workaround: Software should not rely on a locked instruction to fence subsequent executions of
MOVNTDQA. Software should insert an MFENCE instruction if it needs to preserve order
between streaming loads and other memory operations.
Status: No Fix.
SKX68. A Spurious APIC Timer Interrupt May Occur After Timed MWAIT
Problem: Due to this erratum, a Timed MWAIT that completes for a reason other than the Timestamp
Counter reaching the target value may be followed by a spurious APIC timer interrupt.
This erratum can occur only if the APIC timer is in TSC-deadline mode and only if the
mask bit is clear in the LVT Timer Register.
Implication: Spurious APIC timer interrupts may occur when the APIC timer is in TSC-deadline mode.
Workaround: TSC-deadline timer interrupt service routines should detect and deal with spurious
interrupts.
Status: No Fix.
SKX72. Processor May Hang When Executing Code In an HLE Transaction Region
Problem: Under certain conditions, if the processor acquires an HLE (Hardware Lock Elision) lock
via the XACQUIRE instruction in the Host Physical Address range between 40000000H and
403FFFFFH, it may hang with an internal timeout error (MCACOD 0400H) logged into
IA32_MCi_STATUS.
Implication: Due to this erratum, the processor may hang after acquiring a lock via XACQUIRE.
Workaround: BIOS can reserve the host physical address ranges of 40000000H and 403FFFFFH (e.g. map
it as UC/MMIO). Alternatively, the VMM (Virtual Machine Monitor) can reserve that
address range so no guest can use it. In non-virtualized systems, the OS can reserve
that memory space.
Status: No fix.
SKX73. IDI_MISC Performance Monitoring Events May be Inaccurate
Problem: The IDI_MISC.WB_UPGRADE and IDI_MISC.WB_DOWNGRADE performance monitoring events (Event
FEH; UMask 02H and 04H) counts cache lines evicted from the L2 cache. Due to this
erratum, the per logical processor count may be incorrect when both logical processors
on the same physical core are active. The aggregate count of both logical processors is
not affected by this erratum.
Implication: IDI_MISC performance monitoring events may be inaccurate. None identified.
Status: No fix.
SKX85. Intel® PT ToPA Tables Read From Non-Cacheable Memory During an Intel® TSX Transaction May
Lead to Processor Hang
Problem: If an Intel® PT (Processor Trace) ToPA (Table of Physical Addresses) table is placed in
UC (Uncacheable) or USWC (Uncacheable Speculative Write Combining) memory, and a ToPA
output region is filled during an Intel® TSX (Transaction Synchronization) transaction,
the resulting ToPA table read may cause a processor hang.
Implication: Placing Intel® PT ToPA tables in non-cacheable memory when Intel® TSX is in use may lead
to a processor hang.
Workaround: None identified. Intel® PT ToPA tables should be located in WB memory if Intel® TSX is in
use.
Status: No fix.
SKX88. Using Intel® TSX Instructions May Lead to Unpredictable System Behavior
Problem: Under complex microarchitectural conditions, software using Intel® Transactional
Synchronization Extensions (Intel® TSX) may result in unpredictable system behavior.
Intel has only seen this under synthetic testing conditions. Intel is not aware of any
commercially available software exhibiting this behavior.
Implication: Due to this erratum, unpredictable system behavior may occur.
Workaround: It is possible for BIOS to contain a workaround for this erratum.
Status: No fix.
SKX91. Performance in an 8sg System May Be Lower Than Expected
Problem: In 8sg (8-socket glueless) systems, certain workloads may generate a significant
stream of accesses to remote nodes, leading to unexpected congestion in the processor's
snoop responses.
Implication: Due to this erratum, 8sg system performance may be lower than expected.
Workaround: A BIOS code change has been identified and may be implemented as a work around for this
erratum
Status: No fix.
SKX92. Memory May Continue to Throttle after MEMHOT# De-assertion
Problem: When MEMHOT# is asserted by an external agent, the CPU may continue to throttle memory
after MEMHOT# de-assertion.
Implication: When this erratum occurs, memory throttling occurs even after de-assertion of MEMHOT#.
Workaround: It is possible for the BIOS to contain a workaround for this erratum.
Status: No fix.
SKX93. Unexpected Uncorrected Machine Check Errors May Be Reported
Problem: In rare micro-architectural conditions, the processor may report unexpected machine
check errors. When this erratum occurs, IA32_MC0_STATUS (MSR 401H) will have the valid
bit set (bit 63), the uncorrected error bit set (bit 61), a model specific error code
of 03H (bits [31:16]) and an MCA error code of 05H (bits [15:0]).
Implication: Due to this erratum, software may observe unexpected machine check exceptions.
Workaround: It is possible for the BIOS to contain a workaround for this erratum.
Status: No fix.
SKX100. A Pending Fixed Interrupt May Be Dispatched Before an Interrupt of The Same Priority
Completes
Problem: Resuming from C6 Sleep-State, with Fixed Interrupts of the same priority queued (in
the corresponding bits of the IRR and ISR APIC registers), the processor may dispatch
the second interrupt (from the IRR bit) before the first interrupt has completed and
written to the EOI register, causing the first interrupt to never complete.
Implication: Due to this erratum, Software may behave unexpectedly when an earlier call to an
Interrupt Handler routine is overridden with another call (to the same Interrupt
Handler) instead of completing its execution.
Workaround: None identified.
Status: No fix.
SKX102. Processor May Behave Unpredictably on Complex Sequence of Conditions Which Involve
Branches That Cross 64 Byte Boundaries
Problem: Under complex micro-architectural conditions involving branch instructions bytes that
span multiple 64 byte boundaries (cross cache line), unpredictable system behavior may
occur.
Implication: When this erratum occurs, the system may behave unpredictably.
Workaround: It is possible for BIOS to contain a workaround for this erratum.
Status: No fix.
SKX103. Executing Some Instructions May Cause Unpredictable Behavior
Problem: Under complex micro-architectural conditions, executing an X87, AVX, or integer divide
instruction may result in unpredictable system behavior.
Implication: When this erratum occurs, the system may behave unpredictably. Intel has not observed
this erratum with any commercially available software.
Workaround: It is possible for the BIOS to contain a workaround for this erratum.
Status: No fix.
SKX114. A Fixed Interrupt May Be Lost When a Core Exits C6
Problem: Under complex micro-architectural conditions, when performance throttling happens during
a core C6 exit, a fixed interrupt may be lost.
Implication: Due to this erratum, a fixed interrupt may be lost when internal throttling happens
during a core C6 exit. Intel has only observed this erratum in synthetic test
conditions.
Workaround: None identified.
Status: No fix.
Diagnostic Steps
- There are two patterns of crashes here.
- RIP: _raw_spin_lock
- RIP: __radix_tree_lookup
- Hard Lockup
[13441.913186] perf: interrupt took too long (2508 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[25076.171245] perf: interrupt took too long (3143 > 3135), lowering kernel.perf_event_max_sample_rate to 63000
[46270.175313] perf: interrupt took too long (3931 > 3928), lowering kernel.perf_event_max_sample_rate to 50000
[138468.326624] BUG: unable to handle kernel paging request at 0000000200000018
[138468.334526] IP: [<ffffffff8f96b74c>] _raw_spin_lock+0xc/0x30
[33261.892297] BUG: unable to handle kernel paging request at 0000000200000018
[33261.900058] IP: [<ffffffffbeb6b74c>] _raw_spin_lock+0xc/0x30
[49082.161931] mm/memory.c:413: bad pmd ffff8a930c69b230(0000000200000000)
[49082.168625] mm/memory.c:413: bad pmd ffff8a931db33770(0000000200000000)
[49082.175326] BUG: unable to handle kernel paging request at 00000afffffd3f60
[49082.182346] IP: [<ffffffff9897ce5c>] __radix_tree_lookup+0x2c/0xf0
[ 598.259136] perf: interrupt took too long (2526 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[ 605.546725] INFO: NMI handler (ghes_notify_nmi) took too long to run: 41.849 msecs
[ 606.163857] INFO: NMI handler (ghes_notify_nmi) took too long to run: 59.590 msecs
[ 606.163884] perf: interrupt took too long (3446 > 3157), lowering kernel.perf_event_max_sample_rate to 58000
[ 619.945543] perf: interrupt took too long (4420 > 4307), lowering kernel.perf_event_max_sample_rate to 45000
[ 621.379484] perf: interrupt took too long (182586 > 5525), lowering kernel.perf_event_max_sample_rate to 1000
[ 633.744911] sched: RT throttling activated
[ 637.023061] perf: interrupt took too long (354826 > 228232), lowering kernel.perf_event_max_sample_rate to 1000
[ 647.439715] usb 1-1.1: USB disconnect, device number 3
[ 669.391826] NMI watchdog: Watchdog detected hard LOCKUP on cpu 28
- vmcore analysis:
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-957.21.3.el7.x86_64/vmlinux
CPUS: 96
DATE: Tue Sep 22 11:20:01 GMT 2020
UPTIME: 13:39:57
LOAD AVERAGE: 0.15, 0.05, 0.06
TASKS: 1380
RELEASE: 3.10.0-957.21.3.el7.x86_64
MEMORY: 575.7 GB
PANIC: "BUG: unable to handle kernel paging request at 00000afffffd3f60"
PID: 338389
COMMAND: "crond"
TASK: ffff8a93a3772080 [THREAD_INFO: ffff8a939d41c000]
CPU: 61
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 338389 TASK: ffff8a93a3772080 CPU: 61 COMMAND: "crond"
#0 [ffff8a939d41f960] machine_kexec at ffffffff98663934
#1 [ffff8a939d41f9c0] __crash_kexec at ffffffff9871d162
#2 [ffff8a939d41fa90] crash_kexec at ffffffff9871d250
#3 [ffff8a939d41faa8] oops_end at ffffffff98d6d778
#4 [ffff8a939d41fad0] no_context at ffffffff98d5bdbe
#5 [ffff8a939d41fb20] __bad_area_nosemaphore at ffffffff98d5be55
#6 [ffff8a939d41fb70] bad_area_nosemaphore at ffffffff98d5bfc6
#7 [ffff8a939d41fb80] __do_page_fault at ffffffff98d706d0
#8 [ffff8a939d41fbf0] do_page_fault at ffffffff98d70925
#9 [ffff8a939d41fc20] page_fault at ffffffff98d6c768
[exception RIP: __radix_tree_lookup+44]
RIP: ffffffff9897ce5c RSP: ffff8a939d41fcd0 RFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000afffffd3f58 RCX: ffff8a939d41fd18
RDX: 0000000000000000 RSI: 0003fffffeffffff RDI: 00000afffffd3f58
RBP: ffff8a939d41fd08 R8: ffff8a930bf35e70 R9: 0000000000000000
R10: 00007fabc99c4b10 R11: 00003ffffffff000 R12: 0003fffffeffffff
R13: 00000afffffd3f58 R14: ffff8a939d41fd18 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
#10 [ffff8a939d41fd10] radix_tree_lookup_slot at ffffffff9897cf42
#11 [ffff8a939d41fd30] __find_get_page at ffffffff987b600e
#12 [ffff8a939d41fd50] find_get_page at ffffffff987b609e
#13 [ffff8a939d41fd60] lookup_swap_cache at ffffffff987ff8b7
#14 [ffff8a939d41fd70] handle_pte_fault at ffffffff987e969d
#15 [ffff8a939d41fe08] handle_mm_fault at ffffffff987ec27d
#16 [ffff8a939d41feb0] __do_page_fault at ffffffff98d70603
#17 [ffff8a939d41ff20] do_page_fault at ffffffff98d70925
#18 [ffff8a939d41ff50] page_fault at ffffffff98d6c768
RIP: 00007fabc97ce960 RSP: 00007ffe48eec478 RFLAGS: 00010206
RAX: 000000005f695141 RBX: 000055ee5dc56e22 RCX: 000055ee5de9a300
RDX: 00007fabc938cf20 RSI: 0000000000000170 RDI: 0000000000000001
RBP: 000055ee5f624f50 R8: 00007fabc871b260 R9: 00000000000060df
R10: 00007fabc99c4b10 R11: 0000000000000246 R12: 000055ee5de9a300
R13: 000055ee5de598c0 R14: 000055ee5f624f80 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
crash> dis -rl ffffffff9897ce5c | tail
0xffffffff9897ce40 <__radix_tree_lookup+16>: mov %rdi,%r13
0xffffffff9897ce43 <__radix_tree_lookup+19>: push %r12
0xffffffff9897ce45 <__radix_tree_lookup+21>: mov %rsi,%r12
0xffffffff9897ce48 <__radix_tree_lookup+24>: push %rbx
0xffffffff9897ce49 <__radix_tree_lookup+25>: sub $0x10,%rsp
0xffffffff9897ce4d <__radix_tree_lookup+29>: mov %gs:0x28,%rax
0xffffffff9897ce56 <__radix_tree_lookup+38>: mov %rax,-0x30(%rbp)
0xffffffff9897ce5a <__radix_tree_lookup+42>: xor %eax,%eax
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/lib/radix-tree.c: 453
0xffffffff9897ce5c <__radix_tree_lookup+44>: mov 0x8(%r13),%rdx
R13: 00000afffffd3f58: invalid address
450 static unsigned radix_tree_load_root(struct radix_tree_root *root,
451 struct radix_tree_node **nodep, unsigned long *maxindex)
452 {
453 struct radix_tree_node *node = rcu_dereference_raw(root->rnode);
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/mm/memory.c: 3433
0xffffffff987e948a <handle_pte_fault+58>: mov 0x10(%rdi),%r12
3429 static int handle_pte_fault(struct vm_fault *vmf)
3430 {
3431 struct vm_area_struct *vma = vmf->vma;
3432 struct mm_struct *mm = vma->vm_mm;
3433 unsigned long address = (unsigned long)vmf->virtual_address;
crash> vm_fault ffff8a939d41fe20
struct vm_fault {
flags = 680,
pgoff = 22,
virtual_address = 0x7fabc97ce960, <<
page = 0x0,
cow_page = 0x0,
orig_pte = {
pte = 0
},
pmd = 0xffff8a930c69b258,
vma = 0xffff8a93a66cae58,
gfp_mask = 131290,
pte = 0xffff8a930bf35e70,
pud = 0xffff8a92af3a4578
}
Page fault happened.
crash> vtop -c 338389 -u 00007fabc97ce960
VIRTUAL PHYSICAL
7fabc97ce960 (not mapped)
PGD: 231fa907f8 => 80000022af3a4067
PUD: 22af3a4578 => 230c69b067
PMD: 230c69b258 => 230bf35067
PTE: 230bf35e70 => 200000000
PTE vtop: cannot determine swap location
- Another vmcore:
crash> sys
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-957.21.3.el7.x86_64/vmlinux
CPUS: 96
DATE: Sat Aug 29 08:07:23 2020
UPTIME: 1 days, 14:33:13
LOAD AVERAGE: 0.61, 0.91, 1.11
TASKS: 1510
RELEASE: 3.10.0-957.21.3.el7.x86_64
MACHINE: x86_64 (2700 Mhz)
MEMORY: 575.7 GB
PANIC: "BUG: unable to handle kernel paging request at 0000000200000018"
DMI_BIOS_VENDOR: HPE
DMI_BIOS_VERSION: I42
DMI_BIOS_DATE: 05/22/2019
DMI_SYS_VENDOR: HPE
DMI_PRODUCT_NAME: Synergy 480 Gen10
Kernel ring buffer shows as below.
crash> ps -S
RU: 96
IN: 1412
WA: 2
Backtrace shows that exception RIP is _raw_spin_lock.
crash> bt
PID: 0 TASK: ffff8db2036a30c0 CPU: 82 COMMAND: "swapper/82"
#0 [ffff8dd52fd835f0] machine_kexec at ffffffff8f263934
#1 [ffff8dd52fd83650] __crash_kexec at ffffffff8f31d162
#2 [ffff8dd52fd83720] crash_kexec at ffffffff8f31d250
#3 [ffff8dd52fd83738] oops_end at ffffffff8f96d778
#4 [ffff8dd52fd83760] no_context at ffffffff8f95bdbe
#5 [ffff8dd52fd837b0] __bad_area_nosemaphore at ffffffff8f95be55
#6 [ffff8dd52fd83800] bad_area_nosemaphore at ffffffff8f95bfc6
#7 [ffff8dd52fd83810] __do_page_fault at ffffffff8f9706d0
#8 [ffff8dd52fd83880] do_page_fault at ffffffff8f970925
#9 [ffff8dd52fd838b0] page_fault at ffffffff8f96c768
[exception RIP: _raw_spin_lock+12]
RIP: ffffffff8f96b74c RSP: ffff8dd52fd83960 RFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000200000018
RBP: ffff8dd52fd839a8 R8: ffff8db121ed5a00 R9: ffff8d85a1b95050
R10: ffff8dd52fd83898 R11: 000000000000e513 R12: 000000000000010c
R13: 0000000000000052 R14: ffff8d85a1b94ec0 R15: 0000000200000018
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff8dd52fd83960] ovs_flow_stats_update at ffffffffc05bdc27 [openvswitch]
#11 [ffff8dd52fd839b0] ovs_dp_process_packet at ffffffffc05bc7cf [openvswitch]
#12 [ffff8dd52fd83a28] ovs_vport_receive at ffffffffc05c6193 [openvswitch]
#13 [ffff8dd52fd83c28] netdev_frame_hook at ffffffffc05c6c3e [openvswitch]
#14 [ffff8dd52fd83c50] __netif_receive_skb_core at ffffffff8f83a04a
#15 [ffff8dd52fd83cc8] __netif_receive_skb at ffffffff8f83a878
#16 [ffff8dd52fd83ce8] netif_receive_skb_internal at ffffffff8f83a900
#17 [ffff8dd52fd83d18] napi_gro_receive at ffffffff8f83b588
#18 [ffff8dd52fd83d40] bnx2x_rx_int at ffffffffc13095e0 [bnx2x]
#19 [ffff8dd52fd83e48] bnx2x_poll at ffffffffc130c40d [bnx2x]
#20 [ffff8dd52fd83e78] net_rx_action at ffffffff8f83af1f
#21 [ffff8dd52fd83ef8] __do_softirq at ffffffff8f2a1075
#22 [ffff8dd52fd83f68] call_softirq at ffffffff8f97932c
#23 [ffff8dd52fd83f80] do_softirq at ffffffff8f22e675
#24 [ffff8dd52fd83fa0] irq_exit at ffffffff8f2a13f5
#25 [ffff8dd52fd83fb8] do_IRQ at ffffffff8f97a606
--- <IRQ stack> ---
#26 [ffff8db2036b3db8] ret_from_intr at ffffffff8f96c362
[exception RIP: cpuidle_enter_state+87]
RIP: ffffffff8f7aef07 RSP: ffff8db2036b3e60 RFLAGS: 00000202
RAX: 00007e3b57c7b902 RBX: 0000000000000000 RCX: 0000000000000018
RDX: 0000000225c17d03 RSI: ffff8db2036b3fd8 RDI: 00007e3b57c7b902
RBP: ffff8db2036b3e88 R8: 0000000000000053 R9: 0000000000000018
R10: 0000000000000157 R11: 00007e855a52d4c0 R12: ffff8db2036b3e30
R13: 0000000000000086 R14: ffff8dd52fd95960 R15: ffff8dd52fd959a0
ORIG_RAX: ffffffffffffff28 CS: 0010 SS: 0018
#27 [ffff8db2036b3e90] cpuidle_idle_call at ffffffff8f7af05e
#28 [ffff8db2036b3ed0] arch_cpu_idle at ffffffff8f2366de
#29 [ffff8db2036b3ee0] cpu_startup_entry at ffffffff8f2fc6da
#30 [ffff8db2036b3f28] start_secondary at ffffffff8f258047
#31 [ffff8db2036b3f50] start_cpu at ffffffff8f2000d5
Disassembly code shows that invalid valud of RDI was dereferencing.
crash> dis -rl ffffffff8f96b74c | tail
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/kernel/spinlock.c: 150
0xffffffff8f96b740 <_raw_spin_lock>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/arch/x86/include/asm/atomic.h: 205
0xffffffff8f96b745 <_raw_spin_lock+5>: xor %eax,%eax
0xffffffff8f96b747 <_raw_spin_lock+7>: mov $0x1,%edx
0xffffffff8f96b74c <_raw_spin_lock+12>: lock cmpxchg %edx,(%rdi)
RDI: 0000000200000018: invalid
255 /* Must be called with rcu_read_lock. */
256 void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
...
287 ovs_flow_stats_update(flow, key->tp.flags, skb);
_______/
/
71 void ovs_flow_stats_update(struct sw_flow *flow, __be16 tcp_flags,
72 const struct sk_buff *skb)
73 {
74 struct flow_stats *stats;
75 unsigned int cpu = smp_processor_id();
76 int len = skb->len + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0);
77
78 stats = rcu_dereference(flow->stats[cpu]);
79
80 /* Check if already have CPU-specific stats. */
81 if (likely(stats)) {
82 spin_lock(&stats->lock);
crash> kmem ffff8d85a1b94ec0
CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME
ffff8db12ae36000 2016 419 5520 345 32k sw_flow
SLAB MEMORY NODE TOTAL ALLOCATED FREE
fffff9343086e400 ffff8d85a1b90000 0 16 5 11
FREE / [ALLOCATED]
[ffff8d85a1b94ec0]
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff9343086e500 1c21b94000 0 0 0 2fffff00008000 tail
sw_flow's contents are partially null'ed out.
crash> dis -rl ffffffffc05bdc27
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/net/openvswitch/flow.c: 73
0xffffffffc05bdbd0 <ovs_flow_stats_update>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffc05bdbd5 <ovs_flow_stats_update+5>: push %rbp
0xffffffffc05bdbd6 <ovs_flow_stats_update+6>: mov %rsp,%rbp
0xffffffffc05bdbd9 <ovs_flow_stats_update+9>: push %r15
0xffffffffc05bdbdb <ovs_flow_stats_update+11>: push %r14
0xffffffffc05bdbdd <ovs_flow_stats_update+13>: mov %rdi,%r14
0xffffffffc05bdbe0 <ovs_flow_stats_update+16>: push %r13
0xffffffffc05bdbe2 <ovs_flow_stats_update+18>: push %r12
0xffffffffc05bdbe4 <ovs_flow_stats_update+20>: push %rbx
0xffffffffc05bdbe5 <ovs_flow_stats_update+21>: sub $0x18,%rsp
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/net/openvswitch/flow.c: 76
0xffffffffc05bdbe9 <ovs_flow_stats_update+25>: movzwl 0xa2(%rdx),%r12d
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/net/openvswitch/flow.c: 73
0xffffffffc05bdbf1 <ovs_flow_stats_update+33>: mov %esi,-0x2c(%rbp)
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/net/openvswitch/flow.c: 75
0xffffffffc05bdbf4 <ovs_flow_stats_update+36>: mov %gs:0x3fa50420(%rip),%r13d # 0xe01c
0xffffffffc05bdbfc <ovs_flow_stats_update+44>: mov %r13d,%eax
0xffffffffc05bdbff <ovs_flow_stats_update+47>: lea (%rdi,%rax,8),%r15
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/net/openvswitch/flow.c: 76
0xffffffffc05bdc03 <ovs_flow_stats_update+51>: shr $0xa,%r12d
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/net/openvswitch/flow.c: 78
0xffffffffc05bdc07 <ovs_flow_stats_update+55>: mov 0x4e0(%r15),%rbx`
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/net/openvswitch/flow.c: 76
0xffffffffc05bdc0e <ovs_flow_stats_update+62>: and $0x4,%r12d
0xffffffffc05bdc12 <ovs_flow_stats_update+66>: add 0x68(%rdx),%r12d
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/net/openvswitch/flow.c: 81
0xffffffffc05bdc16 <ovs_flow_stats_update+70>: test %rbx,%rbx
0xffffffffc05bdc19 <ovs_flow_stats_update+73>: je 0xffffffffc05bdc82 <ovs_flow_stats_update+178>
/usr/src/debug/kernel-3.10.0-957.21.3.el7/linux-3.10.0-957.21.3.el7.x86_64/include/linux/spinlock.h: 337
0xffffffffc05bdc1b <ovs_flow_stats_update+75>: lea 0x18(%rbx),%r15
0xffffffffc05bdc1f <ovs_flow_stats_update+79>: mov %r15,%rdi
0xffffffffc05bdc22 <ovs_flow_stats_update+82>: callq 0xffffffff8f96b740 <_raw_spin_lock>
RBX: 0000000200000000
R15: 0000000200000018
crash> sw_flow ffff8d85a1b94ec0
struct sw_flow {
rcu = {
next = 0x0,
func = 0x0
},
flow_table = {
node = {{
next = 0x0,
pprev = 0x0
}, {
next = 0x0,
pprev = 0xffff8dd527f21d88
}},
hash = 1001454895
},
ufid_table = {
node = {{
next = 0x0,
pprev = 0xffff8db0b4e3bfb0
}, {
next = 0x0,
pprev = 0x0
}},
hash = 756155237
},
...
mask = 0xffff8d8d2d347000,
sf_acts = 0xffff8d8d27aa2ac0,
stats = 0xffff8d85a1b953a0
}
Data is mostly corrupted.
crash> rd ffff8d85a1b94ec0 200
ffff8d85a1b94ec0: 0000000000000000 0000000000000000 ................
ffff8d85a1b94ed0: 0000000000000000 0000000000000000 ................
ffff8d85a1b94ee0: 0000000000000000 ffff8dd527f21d88 ...........'....
ffff8d85a1b94ef0: 000000003bb0fd2f 0000000000000000 /..;............
ffff8d85a1b94f00: ffff8db0b4e3bfb0 0000000000000000 ................
ffff8d85a1b94f10: 0000000000000000 000000002d120365 ........e..-....
...
ffff8d85a1b951b0: 0000000200000000 0000000000000000 ................
ffff8d85a1b951c0: 0000000000000000 0000000000000000 ................
ffff8d85a1b951d0: 0000000000000000 0000000000000000 ................
ffff8d85a1b951e0: 0000000000000000 0000000000000000 ................
ffff8d85a1b951f0: 0000000200000000 0000000000000000 ................
ffff8d85a1b95200: 0000000000000000 0000000000000000 ................
ffff8d85a1b95210: 0000000000000000 0000000000000000 ................
ffff8d85a1b95220: 0000000000000000 0000000000000000 ................
ffff8d85a1b95230: 0000000200000000 0000000000000000 ................
ffff8d85a1b95240: 0000000000000000 0000000000000000 ................
ffff8d85a1b95250: 0000000000000000 0000000000000000 ................
ffff8d85a1b95260: 0000000000000000 0000000000000000 ................
ffff8d85a1b95270: 0000000200000000 0000000000000000 ................
ffff8d85a1b95280: 0000000000000000 0000000000000000 ................
ffff8d85a1b95290: 0000000000000000 0000000000000000 ................
ffff8d85a1b952a0: 0000000000000000 0000000000000000 ................
ffff8d85a1b952b0: 0000000200000000 0000000000000000 ................
ffff8d85a1b952c0: 0000000000000000 0000000000000000 ................
ffff8d85a1b952d0: 0000000000000000 0000000000000000 ................
ffff8d85a1b952e0: 0000000000000000 0000000000000000 ................
ffff8d85a1b952f0: 0000000200000000 0000000000000000 ................
ffff8d85a1b95300: 0000000000000000 0000000000000000 ................
ffff8d85a1b95310: 0000000000000000 0000000000000000 ................
ffff8d85a1b95320: 0000000000000000 0000000000000000 ................
ffff8d85a1b95330: 0000000200000000 0000000000000000 ................
ffff8d85a1b95340: 0000000000000000 0000000000000000 ................
ffff8d85a1b95350: 0000000000000000 0000000000000000 ................
ffff8d85a1b95360: 0000000000000000 0000000000000000 ................
ffff8d85a1b95370: 0000000200000000 0000000000000000 ................
ffff8d85a1b95380: 0000000000000000 0000000000000000 ................
ffff8d85a1b95390: ffff8d8d2d347000 ffff8d8d27aa2ac0 .p4-.....*.'....
ffff8d85a1b953a0: ffff8d8cb9f2dc00 0000000000000000 ................
ffff8d85a1b953b0: 0000000200000000 0000000000000000 ................
3rd party module 'scini' is in use.
crash> mod -t
NAME TAINTS
overlay T
scini POE
crash> mod|grep scini
ffffffffc08fa000 scini 803568 (not loaded) [CONFIG_KALLSYMS]
crash> module.version ffffffffc08fa000
version = 0xffff8df92e244c00 "DellEMC ScaleIO Version: R2_6.11000.113"
- Another Pattern
Backtrace of panic task:
crash> bt
PID: 0 TASK: ffff8dfe2dfa1070 CPU: 70 COMMAND: "swapper/70"
#0 [ffff8dfe2dfafab0] machine_kexec at ffffffffae666254
#1 [ffff8dfe2dfafb10] __crash_kexec at ffffffffae722ef2
#2 [ffff8dfe2dfafbe0] crash_kexec at ffffffffae722fe0
#3 [ffff8dfe2dfafbf8] oops_end at ffffffffaed8a798
#4 [ffff8dfe2dfafc20] no_context at ffffffffae675d74
#5 [ffff8dfe2dfafc70] __bad_area_nosemaphore at ffffffffae676042
#6 [ffff8dfe2dfafcc0] bad_area_nosemaphore at ffffffffae676164
#7 [ffff8dfe2dfafcd0] __do_page_fault at ffffffffaed8d750
#8 [ffff8dfe2dfafd40] do_page_fault at ffffffffaed8d975
#9 [ffff8dfe2dfafd70] page_fault at ffffffffaed89778
#10 [ffff8dfe2dfafe58] cpuidle_enter_state at ffffffffaebc61d5
#11 [ffff8dfe2dfafe90] cpuidle_idle_call at ffffffffaebc633e
#12 [ffff8dfe2dfafed0] arch_cpu_idle at ffffffffae637c8e
#13 [ffff8dfe2dfafee0] cpu_startup_entry at ffffffffae701c7a
#14 [ffff8dfe2dfaff28] start_secondary at ffffffffae65a727
#15 [ffff8dfe2dfaff50] start_cpu at ffffffffae6000d5
crash> whatis cpuidle_enter_state
int cpuidle_enter_state(struct cpuidle_device *, struct cpuidle_driver *, int);
crash> dis -r ffffffffaebc633e | tail | grep mov
0xffffffffaebc6325 <cpuidle_idle_call+197>: mov 0x4(%rbx),%eax
0xffffffffaebc6328 <cpuidle_idle_call+200>: mov %eax,-0x2c(%rbp)
0xffffffffaebc6330 <cpuidle_idle_call+208>: mov %r12d,%edx
0xffffffffaebc6333 <cpuidle_idle_call+211>: mov %r14,%rsi
0xffffffffaebc6336 <cpuidle_idle_call+214>: mov %rbx,%rdi
0xffffffffaebc633e <cpuidle_idle_call+222>: mov 0x4(%rbx),%r12d
crash> dis -r ffffffffaebc61d5 | head -15 | grep push
0xffffffffaebc6195 <cpuidle_enter_state+5>: push %rbp
0xffffffffaebc619c <cpuidle_enter_state+12>: push %r15
0xffffffffaebc619e <cpuidle_enter_state+14>: push %r14
0xffffffffaebc61a7 <cpuidle_enter_state+23>: push %r13
0xffffffffaebc61ad <cpuidle_enter_state+29>: push %r12
0xffffffffaebc61b7 <cpuidle_enter_state+39>: push %rbx
crash> bt -f|grep cpuidle_idle -B4
ffff8dfe2dfafe60: ffffbeba0ff83890 0000000000000003 %rbx and %r12
ffff8dfe2dfafe70: 0000000000000003 ffffffffaf2de680 %r13 and %r14
ffff8dfe2dfafe80: ffff8dfe2dfac000 ffff8dfe2dfafec8 %r15 and %rbp
ffff8dfe2dfafe90: ffffffffaebc633e Return value
#11 [ffff8dfe2dfafe90] cpuidle_idle_call at ffffffffaebc633e
crash> cpuidle_device.cpu,state_count ffffbeba0ff83890
cpu = 70
state_count = 4
crash> cpuidle_driver.name,state_count ffffffffaf2de680
name = 0xffffffffaf0b46d5 "intel_idle"
state_count = 4
index:0x03
CPU 70 states is C6.
crash> cpuidle_driver.states[3] ffffffffaf2de680
states[3] = {
name = "C6-SKX\000\000\000\000\000\000\000\000\000",
desc = "MWAIT 0x20\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
flags = 536936448,
exit_latency = 133,
power_usage = 0,
target_residency = 600,
disabled = false,
enter = 0xffffffffaed88060,
enter_dead = 0x0
},
- Another pattern
2408 [ 1591.421146] perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
2409 [ 1797.992688] perf: interrupt took too long (3161 > 3133), lowering kernel.perf_event_max_sample_rate to 63000
2410 [ 2076.376223] perf: interrupt took too long (3980 > 3951), lowering kernel.perf_event_max_sample_rate to 50000
2411 [ 2561.353779] perf: interrupt took too long (4981 > 4975), lowering kernel.perf_event_max_sample_rate to 40000
2412 [79636.755011] perf: interrupt took too long (6252 > 6226), lowering kernel.perf_event_max_sample_rate to 31000
2414 [240767.457308] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
2415 [240767.465243] IP: [<ffffffffbaf2bd59>] audit_copy_inode+0x29/0xb0
2416 [240767.471262] PGD 80000017d0263067 PUD 180ccbe067 PMD 0
2417 [240767.476544] Oops: 0000 [#1] SMP
2418 [240767.479899] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grac e fscache iptable_filter ib_isert iscsi_target_mod target_core_mod ib_ucm dm_mirror dm_region_hash vfat dm_log fat dm_mod intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper rpcrdma cryptd iTCO_wdt sunrpc i TCO_vendor_support rdma_ucm ib_uverbs ib_iser rdma_cm iw_cm opa_vnic libiscsi ib_umad ib_ipoib pcspkr mei_me joydev i2c_i801 nfit scsi_transport_isc si ib_cm ipmi_si sg shpchp wmi acpi_power_meter libnvdimm ipmi_devintf ipmi_msghandler mei acpi_pad lpc_ich acpi_cpufreq binfmt_misc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic hfi1(OE) drm_kms_helper
2419 [240767.552082] syscopyarea sysfillrect sysimgblt ixgbe fb_sys_fops rdmavt(OE) ttm ahci ib_core crct10dif_pclmul crct10dif_common crc32c_intel liba hci mdio drm ptp libata i2c_algo_bit pps_core i2c_core dca
2420 [240767.569242] CPU: 23 PID: 293049 Comm: SU2_CFD Kdump: loaded Tainted: G OE ------------ 3.10.0-862.14.4.el7.x86_64 #1
2421 [240767.580937] Hardware name: Intel Corporation S2600BPB/S2600BPB, BIOS SE5C620.86B.00.01.0013.030920180427 03/09/2018
2422 [240767.591423] task: ffff89fc9d300fd0 ti: ffff89fd3be7c000 task.ti: ffff89fd3be7c000
2423 [240767.598973] RIP: 0010:[<ffffffffbaf2bd59>] [<ffffffffbaf2bd59>] audit_copy_inode+0x29/0xb0
2424 [240767.607411] RSP: 0018:ffff89fd3be7fc48 EFLAGS: 00010246
2425 [240767.612798] RAX: 0000000000000000 RBX: ffff89fc9a775060 RCX: 0000000000000000
2426 [240767.619998] RDX: 0000000000000000 RSI: ffff89fc9a77509c RDI: ffff89fc9a775060
2427 [240767.627199] RBP: ffff89fd3be7fc78 R08: 0000000000000000 R09: 0000000000000000
2428 [240767.634402] R10: ffff8a0883cf1c80 R11: ffff8a0883cf1c80 R12: ffff8a0883cf1c80
2429 [240767.641605] R13: ffff89fc9a775060 R14: 00000000000066a6 R15: 0000000000000000
2430 [240767.648809] FS: 00002b99cdcea6c0(0000) GS:ffff8a0d8bf40000(0000) knlGS:0000000000000000
2431 [240767.656962] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2432 [240767.662781] CR2: 0000000000000040 CR3: 0000000cda3ec000 CR4: 00000000005607e0
2433 [240767.669981] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2434 [240767.677185] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2435 [240767.684384] PKRU: 55555554
2436 [240767.687178] Call Trace:
2437 [240767.689714] [<ffffffffbaed1fcb>] ? wake_up_q+0x5b/0x80
2438 [240767.695015] [<ffffffffbaf329ea>] __audit_inode+0x18a/0x3c0
2439 [240767.700661] [<ffffffffbb02fba3>] do_last+0xd13/0x12c0
2440 [240767.705875] [<ffffffffbb1569b2>] ? __rb_insert_augmented+0x92/0x1f0
2441 [240767.712296] [<ffffffffbb030227>] path_openat+0xd7/0x640
2442 [240767.717684] [<ffffffffbb031dbd>] do_filp_open+0x4d/0xb0
2443 [240767.723070] [<ffffffffbb03f167>] ? __alloc_fd+0x47/0x170
2444 [240767.728544] [<ffffffffbb01e0d7>] do_sys_open+0x137/0x240
2445 [240767.734016] [<ffffffffbb01e1fe>] SyS_open+0x1e/0x20
2446 [240767.739058] [<ffffffffbb52579b>] system_call_fastpath+0x22/0x27
2447 [240767.745134] Code: 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 f4 48 8d 77 3c 53 48 89 fb 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 45 e8 31 c0 <48> 8b 42 40 48 89 47 20 48 8b 42 28 8b 40 10 89 47 28 0f b7 02
2448 [240767.765529] RIP [<ffffffffbaf2bd59>] audit_copy_inode+0x29/0xb0
2449 [240767.771632] RSP <ffff89fd3be7fc48>
2450 [240767.775203] CR2: 0000000000000040
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments