Unexplained behaviour of RHEL systems with Broadwell/Haswell CPUs
Issue
- System becomes sluggish with no load for a while.
- System crashes with General Protection Fault.
- Kernel crashes with below logs:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffc027a5ef>] xlog_cil_push+0x18f/0x430 [xfs]
PGD 0
Oops: 0000 [#1] SMP
CPU: 5 PID: 32756 Comm: kworker/5:2 Kdump: loaded Tainted: G W ------------ T 3.10.0-1062.1.1.el7.x86_64 #1
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018
Workqueue: xfs-cil/dm-19 xlog_cil_push_work [xfs]
task: ffff9baa63f8a0e0 ti: ffff9bab33b4c000 task.ti: ffff9bab33b4c000
RIP: 0010:[<ffffffffc027a5ef>] [<ffffffffc027a5ef>] xlog_cil_push+0x18f/0x430 [xfs]
Call Trace:
[<ffffffff918ae168>] ? add_timer+0x18/0x20
[<ffffffff918bb3db>] ? __queue_delayed_work+0x8b/0x1a0
[<ffffffffc027a8a5>] xlog_cil_push_work+0x15/0x20 [xfs]
[<ffffffff918bd0ff>] process_one_work+0x17f/0x440
[<ffffffff918be368>] worker_thread+0x278/0x3c0
[<ffffffff918be0f0>] ? manage_workers.isra.26+0x2a0/0x2a0
[<ffffffff918c50d1>] kthread+0xd1/0xe0
[<ffffffff918c5000>] ? insert_kthread_work+0x40/0x40
[<ffffffff91f8cd37>] ret_from_fork_nospec_begin+0x21/0x21
[<ffffffff918c5000>] ? insert_kthread_work+0x40/0x40
Code: 46 08 48 39 85 58 ff ff ff 74 61 45 31 ff eb 28 0f 1f 40 00 48 8b 70 10 49 89 37 4c 8b 78 10 48 c7 40 10 00 00 00 00 49 8b 46 08 <45> 03 6f 08 48 39 85 58 ff ff ff 74 34 48 89 c7 48 89 85 68 ff
RIP [<ffffffffc027a5ef>] xlog_cil_push+0x18f/0x430 [xfs]
RSP <ffff9bab33b4fd48>
CR2: 0000000000000008
[32386.065177] ------------[ cut here ]------------
[32386.070257] kernel BUG at fs/jbd2/journal.c:2482!
[32386.075426] invalid opcode: 0000 [#1] SMP
[32386.226624] CPU: 31 PID: 7287 Comm: jbd2/dm-96-8 Not tainted 3.10.0-514.el7.x86_64 #1
[32386.235232] Hardware name: HP Superdome2 16s x86, BIOS Bundle: 008.004.084 SFW: 043.025.000 08/16/2016
[32386.245464] task: ffff887f7381edd0 ti: ffff887f7f284000 task.ti: ffff887f7f284000
[32386.253691] RIP: 0010:[<ffffffffa071ace2>] [<ffffffffa071ace2>] jbd2_journal_put_journal_head+0x142/0x146 [jbd2]
[32386.348816] Stack:
[32386.351025] ffff881911ea9400 ffff885f7ba34000 ffff887f7f287ca0 ffffffffa0713dbb
[32386.359192] ffffffff811899e0 ffff885f7ba343a0 ffff881ff6b5ca92 ffff88de9820ce38
[32386.367365] ffff88dea9015000 ffff881fe8d67600 ffff887f7f287e40 ffffffffa07120f6
[32386.375535] Call Trace:
[32386.378257] [<ffffffffa0713dbb>] __jbd2_journal_remove_checkpoint+0x5b/0x160 [jbd2]
[32386.386782] [<ffffffff811899e0>] ? free_pages.part.80+0x40/0x50
[32386.393389] [<ffffffffa07120f6>] jbd2_journal_commit_transaction+0x1106/0x19a0 [jbd2]
[32386.402100] [<ffffffff81029569>] ? __switch_to+0xd9/0x4c0
[32386.408131] [<ffffffffa0716e99>] kjournald2+0xc9/0x260 [jbd2]
[32386.414551] [<ffffffff810b1600>] ? wake_up_atomic_t+0x30/0x30
[32386.420967] [<ffffffffa0716dd0>] ? commit_timeout+0x10/0x10 [jbd2]
[32386.427858] [<ffffffff810b052f>] kthread+0xcf/0xe0
[32386.433218] [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
[32386.440402] [<ffffffff81696418>] ret_from_fork+0x58/0x90
[32386.446335] [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
[32386.453509] Code: c7 c6 80 b5 71 a0 48 c7 c7 e8 d8 71 a0 31 c0 e8 2b 47 f6 e0 48 8b 73 20 49 8b 7c 24 20 e8 f7 f8 ff ff e9 64 ff ff ff 0f 0b 0f 0b <0f> 0b 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 0f 1f 44 00 00
[32386.474936] RIP [<ffffffffa071ace2>] jbd2_journal_put_journal_head+0x142/0x146 [jbd2]
[32386.483653] RSP <ffff887f7f287c50>
Environment
- Red Hat Enterprise Linux
-
Seen on below systems:
- HP Superdome2 16s x86, BIOS Bundle: 008.008.034 SFW: 045.018.000 10/01/2019
- HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 10/21/2019
- Huawei RH2288H V3/BC11HGSA0 BIOS 3.79 11/07/2017 Insyde Corp. RH2288H V3
- Huawei XH628 V3/BC21HGSA0, BIOS 3.50 11/23/2016
- Cloudian HSA-1512/S2PH-MB, BIOS S2P_3B13.01 07/12/2019
- Radisys DCE-CSLED-V2-2-001/S2600TPR, BIOS SE5C610.86B.01.01.0027.071020182329 07/10/2018
-
Seen on below Intel(R) Xeon(R) v4:
- Intel(R) Xeon(R) CPU E7-8891 v4 @ 2.80GHz ff-mm-ss: 06-4f-01 microcode: 0xb000038
- Intel(R) Xeon(R) CPU E7-8890 v4 @ 2.20GHz ff-mm-ss: 06-4f-01 microcode: 0xb000036
- Intel(R) Xeon(R) CPU E7-8855 v4 microcode: sig=0x406f1, pf=0x80, revision=0xb000033
- Intel(R) Xeon(R) CPU E7-8893 v4 @ 3.20GHz microcode: sig=0x406f1, pf=0x80, revision=0xb00002e
- Intel(R) Xeon(R) CPU E5-2630L v4 microcode: sig=0x406f1, pf=0x1, revision=0xb00002e
- Intel(R) Xeon(R) CPU E7-8894 v4 @ 2.40GHz microcode: sig=0x406f1, pf=0x80, revision=0xb00002a
- Intel(R) Xeon(R) CPU E7-8880 v4 @ 2.20GHz microcode: sig=0x406f1, pf=0x80, revision=0xb000021
- Intel(R) Xeon(R) CPU E7-8891 v4 microcode: sig=0x406f1, pf=0x80, revision=0xb000020
- Intel(R) Xeon(R) CPU E5-2620 v4 microcode: sig=0x406f1, pf=0x1, revision=0xb000036
- Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz ff-mm-ss: 06-4f-01 microcode: 0xb00002a
- Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz microcode: sig=0x406f1, pf=0x1, revision=0xb000021
- Intel(R) Xeon(R) CPU E5-2618L v4 microcode: sig=0x406f1, pf=0x8, revision=0xb000021
-
Seen on below Intel(R) Xeon(R) v3:
- Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz ff-mm-ss: 06-3f-04 microcode: 0x12
- Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz microcode: sig=0x306f4, pf=0x80, revision=0xd
- Intel(R) Xeon(R) CPU E7-8891 v3 @ 2.80GHz microcode: sig=0x306f4, pf=0x80, revision=0x16
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.