OSP 16.2: kernel panic when trying to use hugepages
Environment
- Red Hat Enterprise Linux (RHEL)
-
3rd party module
vrouter
21.4vrouter version: {"build-info": [{"build-time": "2022-06-20 03:02:45.919367", "build-hostname": "contrail-nightly-tbf5h", "build-user": "contrail-builder", "build-version": "21.4"}]}
Issue
-
Kernel panic with below logs:
[...] [...] [...] [ 39.723351] vrouter: loading out-of-tree module taints kernel. [ 39.723638] vrouter: module verification failed: signature and/or required key missing - tainting kernel [ 39.725789] vrouter version: {"build-info": [{"build-time": "2022-06-20 03:02:45.919367", "build-hostname": "contrail-nightly-tbf5h", "build-user": "contrail-builder", "build-version": "21.4"}]} [ 39.729090] linux_if_notifier: dev 0000000048dbe25d event REGISTER [ 39.729095] linux_if_notifier: dev 0000000048dbe25d event UNREGISTER [ 39.729096] linux_if_notifier: dev 00000000b2294c78 event REGISTER [ 39.729099] linux_if_notifier: dev 00000000b2294c78 event UNREGISTER [ 39.729100] linux_if_notifier: dev 00000000f83e4fc1 event REGISTER [ 39.729103] linux_if_notifier: dev 00000000f83e4fc1 event UNREGISTER [ 39.729104] linux_if_notifier: dev 00000000a3222b11 event REGISTER [ 39.729107] linux_if_notifier: dev 00000000a3222b11 event UNREGISTER [ 39.729108] linux_if_notifier: dev 00000000742ffc82 event REGISTER [ 39.729111] linux_if_notifier: dev 00000000742ffc82 event UNREGISTER [ 39.729112] linux_if_notifier: dev 00000000d53b8b31 event REGISTER [ 39.729115] linux_if_notifier: dev 00000000d53b8b31 event UNREGISTER [ 39.729115] linux_if_notifier: dev 000000005f264fae event REGISTER [ 39.729118] linux_if_notifier: dev 000000005f264fae event UNREGISTER [ 39.729119] linux_if_notifier: dev 0000000038caa1a6 event REGISTER [ 39.729122] linux_if_notifier: dev 0000000038caa1a6 event UNREGISTER [ 39.729123] linux_if_notifier: dev 00000000f7a21872 event REGISTER [ 39.729126] linux_if_notifier: dev 00000000f7a21872 event UNREGISTER [ 39.729127] linux_if_notifier: dev 00000000f62ec522 event REGISTER [ 39.729130] linux_if_notifier: dev 00000000f62ec522 event UNREGISTER [ 39.729130] linux_if_notifier: dev 0000000097410a0d event REGISTER [ 39.729133] linux_if_notifier: dev 0000000097410a0d event UNREGISTER [ 39.729134] linux_if_notifier: dev 00000000fccfb6a8 event REGISTER [ 39.729137] linux_if_notifier: dev 00000000fccfb6a8 event UNREGISTER [ 39.729138] linux_if_notifier: dev 00000000669bc5e7 event REGISTER [ 39.729141] linux_if_notifier: dev 00000000669bc5e7 event UNREGISTER [ 39.729141] linux_if_notifier: dev 00000000d9c171a4 event REGISTER [ 39.729144] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 39.729145] linux_if_notifier: dev 00000000708c867f event REGISTER [ 39.729148] linux_if_notifier: dev 00000000708c867f event UNREGISTER [ 39.729148] linux_if_notifier: dev 00000000e62ff6ea event REGISTER [ 39.729151] linux_if_notifier: dev 00000000e62ff6ea event UNREGISTER [ 39.729152] linux_if_notifier: dev 0000000082e5b053 event REGISTER [ 39.729155] linux_if_notifier: dev 0000000034e484db event REGISTER [ 39.729158] linux_if_notifier: dev 00000000c1fc6e3d event REGISTER [ 39.977529] linux_if_notifier: dev 0000000082e5b053 event UNREGISTER [ 39.977588] linux_if_notifier: dev 0000000082e5b053 event UNREGISTER [ 40.176831] linux_if_notifier: dev 00000000c1fc6e3d event UNREGISTER [ 40.176876] linux_if_notifier: dev 00000000c1fc6e3d event UNREGISTER [ 40.375398] linux_if_notifier: dev 0000000034e484db event UNREGISTER [ 40.375439] linux_if_notifier: dev 0000000034e484db event UNREGISTER [ 40.378814] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 40.378986] linux_if_notifier: dev 00000000de1e73eb event REGISTER [ 40.383079] Vrouter: vr_interface_get:3548 vif is NULL for vifr_idx: -1 [ 40.383125] Num phy interfaces 1 [ 40.383716] Vrouter: vr_interface_get:3548 vif is NULL for vifr_idx: -1 [ 40.384584] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 40.384599] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 40.385463] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 40.385547] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 40.398649] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 40.398671] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 44.515793] linux_if_notifier: dev 00000000ab576300 event UNREGISTER [ 44.516085] linux_if_notifier: dev 00000000ab576300 event REGISTER [ 44.516125] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 44.516188] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 44.543390] linux_if_notifier: dev 00000000ab576300 event UNREGISTER [ 44.543456] linux_if_notifier: dev 00000000ab576300 event UNREGISTER [ 48.657259] linux_if_notifier: dev 00000000b3df36a1 event UNREGISTER [ 48.657403] linux_if_notifier: dev 00000000b3df36a1 event REGISTER [ 48.657411] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 48.657434] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 48.682969] linux_if_notifier: dev 00000000b3df36a1 event UNREGISTER [ 48.683028] linux_if_notifier: dev 00000000b3df36a1 event UNREGISTER [ 52.801682] linux_if_notifier: dev 000000006b794c44 event UNREGISTER [ 52.801956] linux_if_notifier: dev 000000006b794c44 event REGISTER [ 52.801972] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 52.802030] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 52.827962] linux_if_notifier: dev 000000006b794c44 event UNREGISTER [ 52.828018] linux_if_notifier: dev 000000006b794c44 event UNREGISTER [ 57.974594] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation [ 57.999789] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 58.003545] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 58.161693] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 58.618324] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 59.179193] tun: Universal TUN/TAP device driver, 1.6 [ 59.179821] linux_if_notifier: dev 000000005a0a064f event UNREGISTER [ 59.179989] linux_if_notifier: dev 000000005a0a064f event REGISTER [ 59.194303] linux_if_notifier: dev 000000005a0a064f event UNREGISTER [ 59.194435] linux_if_notifier: dev 000000005a0a064f event UNREGISTER [ 59.259446] Config Hugepage vmem 00000000f01948b1 psize 1073741824 mem_sz 20185088 path /dev/hugepages/bridge [ 59.263128] Pinned huge page uspace_vmem 00000000f01948b1 start_page_addr 0000000005a5d439 num 4k pages 4928 mem_size 20185088 file_path /dev/hugepages/bridge [ 59.263130] Config Hugepage vmem 000000006ffa70c9 psize 1073741824 mem_sz 161218560 path /dev/hugepages/flow [ 59.291301] Pinned huge page uspace_vmem 000000006ffa70c9 start_page_addr 000000001b63c890 num 4k pages 39360 mem_size 161218560 file_path /dev/hugepages/flow [ 59.327011] contrail-nodemg[7742]: segfault at 7fb72c0d9802 ip 00007fb72b2638c0 sp 00007ffc7a6de688 error 7 in libc-2.28.so[7fb72b21c000+1bc000] [ 59.327018] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 59.327236] nova-compute[7781]: segfault at 80 ip 00007fd23230471d sp 00007ffedde16a70 error 4 in libpython3.6m.so.1.0[7fd232168000+2ab000] [ 59.327241] Code: 8b 35 87 28 31 00 8b 7e 20 8d 57 01 89 56 20 41 39 16 0f 8c 93 08 00 00 49 39 cb 0f 84 ec 05 00 00 49 39 db 0f 84 13 05 00 00 <49> 8b 8b 80 00 00 00 48 85 c9 0f 84 4d 08 00 00 48 85 ed 0f 88 a3 [ 59.330053] tuned[8456]: segfault at 0 ip 00007f8c02e24b7b sp 00007f8be2eee2e0 error 4 in libpython3.6m.so.1.0[7f8c02d37000+2ab000] [ 59.330064] Code: 41 8d 5c 24 ff 4c 8d 05 03 84 40 00 c1 eb 03 8d 2c 1b 49 8b 34 e8 48 8b 7e 10 48 39 fe 0f 84 bc 01 00 00 4c 8b 56 08 83 06 01 <49> 8b 02 48 89 46 08 48 85 c0 74 39 45 85 ed 75 64 48 8b 74 24 68 [ 59.330207] traps: yum[9080] general protection fault ip:7efe46339780 sp:7ffda54025b8 error:0 in libc-2.28.so[7efe462eb000+1bc000] [ 59.336830] contrail-versio[9076]: segfault at 0 ip 00007f44fc5db3f0 sp 00007ffe44d17f18 error 6 in libc-2.28.so[7f44fc58c000+1bc000] [ 59.336836] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 59.337423] tail[9077]: segfault at 0 ip 00007ff0f47bc350 sp 00007ffca43fe8b8 error 6 in libc-2.28.so[7ff0f476b000+1bc000] [ 59.337428] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 59.337453] sh[9075]: segfault at 0 ip 0000000000000000 sp 00007ffd4de8c698 error 14 in bash[55df1267c000+108000] [ 59.337456] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [ 59.338664] contrail-vroute[9025]: segfault at 0 ip 000000000093e050 sp 00007fff7b23ae58 error 6 in contrail-vrouter-agent[400000+1a4f000] [ 59.338668] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 59.338729] entrypoint.sh[7836]: segfault at 0 ip 00007fdb6cbcfd50 sp 00007fffbc949ff8 error 6 in libc-2.28.so[7fdb6cb81000+1bc000] [ 59.338735] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 59.342797] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [ 59.348916] ------------[ cut here ]------------ [ 59.348960] tee[8486]: segfault at 0 ip 00007f276de04350 sp 00007ffd9df3c638 error 6 in libc-2.28.so[7f276ddb3000+1bc000] [ 59.348965] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 59.350694] PGD 0 P4D 0 [ 59.355356] kernel BUG at include/linux/fs.h:2943! [ 59.366398] Oops: 0000 [#1] SMP NOPTI [ 59.396401] CPU: 18 PID: 8456 Comm: tuned Kdump: loaded Tainted: G IOE --------- - - 4.18.0-305.30.1.el8_4.x86_64 #1 [ 59.408138] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 07/16/2020 [ 59.416728] RIP: 0010:filemap_fault+0x47/0xa20 [ 59.421204] Code: 04 25 28 00 00 00 48 89 84 24 a0 00 00 00 31 c0 48 8b 07 48 8b 80 a0 00 00 00 4c 8b a0 f0 00 00 00 48 89 44 24 08 4d 8b 34 24 <49> 8b 46 50 48 05 ff 0f 00 00 48 c1 e8 0c 49 39 c5 0f 83 dd 05 00 [ 59.440118] RSP: 0000:ffffbece6331b838 EFLAGS: 00010246 [ 59.445379] RAX: ffff9d703dfb5100 RBX: ffffbece6331b998 RCX: 0000000000000000 [ 59.452564] RDX: 0000000000000000 RSI: 0000000000000100 RDI: ffffbece6331b998 [ 59.459750] RBP: ffff9d8fbdb77b40 R08: 0000000000000000 R09: 0000000001db8093 [ 59.466936] R10: 0000000000000001 R11: 0000000000000100 R12: ffff9d8fbdb77e00 [ 59.474122] R13: 00000000000002b5 R14: 0000000000000000 R15: 0000000000000000 [ 59.481307] FS: 00007f8be2eef700(0000) GS:ffff9d707fa80000(0000) knlGS:0000000000000000 [ 59.489456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.495243] CR2: 0000000000000050 CR3: 0000003d7eafe005 CR4: 00000000007706e0 [ 59.502429] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 59.509614] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 59.516800] PKRU: 55555554 [ 59.519524] Call Trace: [ 59.521991] ? _cond_resched+0x15/0x30 [ 59.526376] __xfs_filemap_fault+0x6d/0x200 [xfs] [ 59.531652] __do_fault+0x36/0xd0 [ 59.535515] __handle_mm_fault+0x3d7/0xca0 [ 59.540200] ? xfs_iunlock+0xf3/0x100 [xfs] [ 59.544925] handle_mm_fault+0xc2/0x1d0 [ 59.549296] __get_user_pages+0x2c4/0x840 [ 59.553836] get_dump_page+0x49/0x80 [ 59.557934] elf_core_dump+0x8cf/0xa50 [ 59.562201] do_coredump+0x73b/0xf4e [ 59.566285] ? sched_clock_cpu+0xc/0xb0 [ 59.570627] ? up+0x12/0x50 [ 59.573912] get_signal+0x14f/0x870 [ 59.577888] ? __send_signal+0x343/0x4a0 [ 59.582295] ? page_fault+0x8/0x30 [ 59.586173] do_signal+0x36/0x660 [ 59.589953] ? force_sig_info+0xc7/0xe0 [ 59.594253] ? force_sig_fault+0x59/0x80 [ 59.598633] ? page_fault+0x8/0x30 [ 59.602485] exit_to_usermode_loop+0x89/0xf0 [ 59.607206] prepare_exit_to_usermode+0x9b/0xa0 [ 59.612178] retint_user+0x8/0x8 [ 59.615825] RIP: 0033:0x7f8c02e24b7b [ 59.619813] Code: 41 8d 5c 24 ff 4c 8d 05 03 84 40 00 c1 eb 03 8d 2c 1b 49 8b 34 e8 48 8b 7e 10 48 39 fe 0f 84 bc 01 00 00 4c 8b 56 08 83 06 01 <49> 8b 02 48 89 46 08 48 85 c0 74 39 45 85 ed 75 64 48 8b 74 24 68 [ 59.639498] RSP: 002b:00007f8be2eee2e0 EFLAGS: 00010202 [ 59.645151] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 00007f8c02f53aac [ 59.652730] RDX: 000000000000001c RSI: 00007f8bec5f4000 RDI: 0000000000000000 [ 59.660308] RBP: 0000000000000006 R08: 00007f8c0322cf60 R09: 00007f8beb3ad340 [ 59.667877] R10: 0000000000000000 R11: 00007f8c0326bee8 R12: 000000000000001c [ 59.675434] R13: 0000000000000000 R14: 000000000000001c R15: 0000000000000001 [ 59.682981] Modules linked in: tun overlay vrouter(OE) 8021q garp mrp bonding nf_log_ipv4 nf_log_ipv6 nf_log_common nft_chain_nat nft_limit ipt_MASQUERADE nft_counter xt_LOG xt_limit xt_multiport xt_comment nf_nat xt_state xt_conntrack xt_addrtype nft_compat nf_tables nfnetlink vfat fat dm_service_time dm_multipath dm_mod intel_rapl_msr intel_rapl_common isst_if_common rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate i40iw mlx5_ib ib_uverbs ib_core intel_uncore pcspkr joydev acpi_ipmi hpwdt hpilo lpc_ich mei_me ipmi_si mei ioatdma ipmi_devintf dca wmi ipmi_msghandler acpi_tad acpi_power_meter ip_tables xfs sd_mod t10_pi sg mlx5_core mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm i40e ahci libahci mlxfw [ 59.683015] pci_hyperv_intf libata tls tg3 i2c_algo_bit nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c crc32c_intel fuse br_netfilter bridge stp llc [ 59.789014] CR2: 0000000000000050
-
Another pattern of crash log:
[...] [...] [...] [ 5910.135160] tun: Universal TUN/TAP device driver, 1.6 [ 5910.135636] linux_if_notifier: dev 000000003f187aea event UNREGISTER [ 5910.135793] linux_if_notifier: dev 000000003f187aea event REGISTER [ 5910.148864] linux_if_notifier: dev 000000003f187aea event UNREGISTER [ 5910.148963] linux_if_notifier: dev 000000003f187aea event UNREGISTER [ 5910.220107] Config Hugepage vmem 00000000eaa94e86 psize 1073741824 mem_sz 20185088 path /dev/hugepages/bridge [ 5910.224002] Pinned huge page uspace_vmem 00000000eaa94e86 start_page_addr 000000009a715c1b num 4k pages 4928 mem_size 20185088 file_path /dev/hugepages/bridge [ 5910.275369] Huge page req_recv_cntr:1 resp_cntr:1 resp: 0 ret: 0 [ 5910.292705] vrouter soft reset start [ 5910.346405] general protection fault: 0000 [#1] SMP PTI [ 5910.351657] CPU: 8 PID: 101631 Comm: contrail-vroute Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.72.1.el8_4.x86_64 #1 [ 5910.364277] Hardware name: Supermicro Super Server/X10DRW-i, BIOS 2.0 12/17/2015 [ 5910.371698] RIP: 0010:vrouter_put_nexthop+0xe/0x130 [vrouter] [ 5910.377454] Code: 06 d3 e8 39 c2 77 d3 31 f6 eb d9 31 c0 5b c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 48 85 ff 0f 84 fd 00 00 00 <8b> 4f 14 85 c9 0f 85 e7 00 00 00 41 54 55 53 48 89 fb f6 47 06 40 [ 5910.396235] RSP: 0018:ffffaf87691ef908 EFLAGS: 00010206 [ 5910.401469] RAX: ffff9df67f20b000 RBX: ffff9df67f20b000 RCX: 0000000000000000 [ 5910.408617] RDX: 00000000000082c0 RSI: ffff9df67f20b000 RDI: 00801f0fc9eb0000 [ 5910.415767] RBP: ffff9dd6c6d96f00 R08: 0000000000000000 R09: ffff9df37f7b0000 [ 5910.422915] R10: ffff9dd61ecc1aa0 R11: 0000000000000001 R12: ffff9df67f20b000 [ 5910.430058] R13: ffffffffc0bada60 R14: 0000000000000000 R15: ffffffffc0bd6300 [ 5910.437210] FS: 00007ffac38b9700(0000) GS:ffff9df77fa00000(0000) knlGS:0000000000000000 [ 5910.445312] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5910.451071] CR2: 00007f462e78e000 CR3: 0000003bf6dd0004 CR4: 00000000001706e0 [ 5910.458216] Call Trace: [ 5910.460685] bridge_table_entry_free+0x34/0x50 [vrouter] [ 5910.466013] vr_htable_reset+0x73/0xf0 [vrouter] [ 5910.470642] bridge_table_deinit+0x29/0x60 [vrouter] [ 5910.475619] bridge_rtb_family_deinit+0x27/0x50 [vrouter] [ 5910.475619] bridge_rtb_family_deinit+0x27/0x50 [vrouter] [ 5910.481031] vr_fib_exit+0x3e/0x60 [vrouter] [ 5910.485315] vrouter_exit+0x8f/0xc0 [vrouter] [ 5910.489684] vrouter_ops_process+0x3b/0x90 [vrouter] [ 5910.494658] sandesh_decode+0x113/0x170 [vrouter] [ 5910.499372] ? sandesh_hdr_free+0x10/0x10 [vrouter] [ 5910.504260] sandesh_proto_decode+0x32/0x50 [vrouter] [ 5910.509324] vr_message_request+0x3a/0x60 [vrouter] [ 5910.514213] netlink_trans_request+0x65/0x2d0 [vrouter] [ 5910.519448] ? __nla_validate_parse+0x47/0x1a0 [ 5910.523900] genl_family_rcv_msg+0x1d7/0x420 [ 5910.528180] ? mls_range_isvalid+0x41/0x50 [ 5910.532289] ? xas_load+0x8/0x80 [ 5910.535529] ? find_get_entry+0xdd/0x1c0 [ 5910.539463] genl_rcv_msg+0x47/0x8c [ 5910.542963] ? genl_family_rcv_msg+0x420/0x420 [ 5910.547421] netlink_rcv_skb+0x4c/0x120 [ 5910.551266] genl_rcv+0x24/0x40 [ 5910.554421] netlink_unicast+0x19e/0x260 [ 5910.558357] netlink_sendmsg+0x204/0x3d0 [ 5910.562294] sock_sendmsg+0x4c/0x50 [ 5910.565793] ____sys_sendmsg+0x1eb/0x250 [ 5910.569729] ? copy_msghdr_from_user+0x5c/0x90 [ 5910.574240] ? xfs_iunlock+0xcc/0x100 [xfs] [ 5910.578945] ___sys_sendmsg+0x7c/0xc0 [ 5910.583176] ? atime_needs_update+0x77/0xe0 [ 5910.587828] ? touch_atime+0x33/0xe0 [ 5910.591923] ? ovl_read_iter+0x182/0x1b0 [overlay] [ 5910.597267] __sys_sendmsg+0x57/0xa0 [ 5910.601373] do_syscall_64+0x5b/0x1a0 [ 5910.605496] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 5910.611021] RIP: 0033:0x7ffaca955a27 [ 5910.615071] Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 0b ed ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 44 ed ff ff 48 [ 5910.634751] RSP: 002b:00007ffac38b7280 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 5910.642740] RAX: ffffffffffffffda RBX: 000000000000000f RCX: 00007ffaca955a27 [ 5910.650368] RDX: 0000000000004000 RSI: 00007ffac38b72c0 RDI: 000000000000000f [ 5910.657962] RBP: 00007ffac38b72c0 R08: 0000000000000000 R09: 00007ffac38b7a10 [ 5910.665547] R10: 0000000007bc3920 R11: 0000000000000293 R12: 0000000000004000 [ 5910.673180] R13: 000000000000000f R14: 00007ffac38b7750 R15: 0000000000000002 [ 5910.680739] Modules linked in: tun vrouter(OE) nls_utf8 isofs cdrom vfat msdos fat ext4 mbcache jbd2 dm_mod ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 scsi_transport_iscsi overlay 8021q garp mrp bonding nft_chain_nat nf_nat nf_log_ipv4 nf_log_ipv6 nf_log_common nft_limit nft_counter xt_LOG xt_limit xt_multiport xt_comment xt_state xt_conntrack nft_compat nf_tables nfnetlink intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate intel_uncore pcspkr mei_me joydev mei i2c_i801 ioatdma lpc_ich wmi acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter auth_rpcgss sunrpc xfs sd_mod t10_pi sg ast drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm ixgbe drm ahci igb libahci libata mdio dca i2c_algo_bit nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c [ 5910.680771] crc32c_intel fuse br_netfilter bridge stp llc
Resolution
- The issue has been resolved with
kernel-4.18.0-513.5.1.el8_9
via Errata RHSA-2023:7077. - The issue was tracked at private bug 2152051.
- Possible workaround is to unload the 3rd party module
vrouter
.
Root Cause
- The 3rd party module
vrouter
triggered the race condition when using hugepages.
Diagnostic Steps
Pre-requisites
-
Deploy kdump in order to collect a vmcore:
- Vmcore analyis is required to determine if you are being impacted by this issue. This first requires that a vmcore is dumped successfully.
- If the
kexec-tools
package is absent or thekdump
service is inactive, please reference the following article to install, enable, start, and configure kdump:
How to troubleshoot kernel crashes, hangs, or reboots with kdump on Red Hat Enterprise Linux
-
Prepare crash environment for vmcore analysis:
- Please reference the following article to set up a vmcore analysis environment:
How to set up a vmcore analysis environment?
- Please reference the following article to set up a vmcore analysis environment:
Vmcore analysis
-
Loading the
vmcore
will give a top overview of the state of the system provided;crash> sys KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/4.18.0-305.30.1.el8_4.x86_64/vmlinux [TAINTED] DUMPFILE: /cores/retrace/tasks/230414208/crash/vmcore [PARTIAL DUMP] CPUS: 96 DATE: Fri Dec 9 03:11:09 GMT 2022 UPTIME: 00:00:59 LOAD AVERAGE: 3.68, 1.38, 0.50 TASKS: 104 RELEASE: 4.18.0-305.30.1.el8_4.x86_64 VERSION: #1 SMP Tue Nov 30 13:13:11 EST 2021 MACHINE: x86_64 (2100 Mhz) MEMORY: 255.7 GB PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000050" crash> sys -i | grep DATE -A2 DMI_BIOS_DATE: 07/16/2020 DMI_SYS_VENDOR: HPE DMI_PRODUCT_NAME: ProLiant DL360 Gen10
- The above shows the system panicked because the kernel attempted to use a nonsensical addess (
PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000050"
)
- The above shows the system panicked because the kernel attempted to use a nonsensical addess (
-
The kernel ring buffer (logs displayed when running
dmesg
) are inspected to provide insight into kernel-specific activity leading up to the crash and provide additional context as to what could have caused the crash;crash> log [...] [ 39.699169] ifup-vhost (3598): drop_caches: 2 [ 39.723351] vrouter: loading out-of-tree module taints kernel. [ 39.723638] vrouter: module verification failed: signature and/or required key missing - tainting kernel [ 39.725789] vrouter version: {"build-info": [{"build-time": "2022-06-20 03:02:45.919367", "build-hostname": "contrail-nightly-tbf5h", "build-user": "contrail-builder", "build-version": "21.4"}]} [ 39.729090] linux_if_notifier: dev 0000000048dbe25d event REGISTER [ 39.729095] linux_if_notifier: dev 0000000048dbe25d event UNREGISTER [ 39.729096] linux_if_notifier: dev 00000000b2294c78 event REGISTER [ 39.729099] linux_if_notifier: dev 00000000b2294c78 event UNREGISTER [ 39.729100] linux_if_notifier: dev 00000000f83e4fc1 event REGISTER [ 39.729103] linux_if_notifier: dev 00000000f83e4fc1 event UNREGISTER [ 39.729104] linux_if_notifier: dev 00000000a3222b11 event REGISTER [ 39.729107] linux_if_notifier: dev 00000000a3222b11 event UNREGISTER [ 39.729108] linux_if_notifier: dev 00000000742ffc82 event REGISTER [ 39.729111] linux_if_notifier: dev 00000000742ffc82 event UNREGISTER [ 39.729112] linux_if_notifier: dev 00000000d53b8b31 event REGISTER [ 39.729115] linux_if_notifier: dev 00000000d53b8b31 event UNREGISTER [ 39.729115] linux_if_notifier: dev 000000005f264fae event REGISTER [ 39.729118] linux_if_notifier: dev 000000005f264fae event UNREGISTER [ 39.729119] linux_if_notifier: dev 0000000038caa1a6 event REGISTER [ 39.729122] linux_if_notifier: dev 0000000038caa1a6 event UNREGISTER [ 39.729123] linux_if_notifier: dev 00000000f7a21872 event REGISTER [ 39.729126] linux_if_notifier: dev 00000000f7a21872 event UNREGISTER [ 39.729127] linux_if_notifier: dev 00000000f62ec522 event REGISTER [ 39.729130] linux_if_notifier: dev 00000000f62ec522 event UNREGISTER [ 39.729130] linux_if_notifier: dev 0000000097410a0d event REGISTER [ 39.729133] linux_if_notifier: dev 0000000097410a0d event UNREGISTER [ 39.729134] linux_if_notifier: dev 00000000fccfb6a8 event REGISTER [ 39.729137] linux_if_notifier: dev 00000000fccfb6a8 event UNREGISTER [ 39.729138] linux_if_notifier: dev 00000000669bc5e7 event REGISTER [ 39.729141] linux_if_notifier: dev 00000000669bc5e7 event UNREGISTER [ 39.729141] linux_if_notifier: dev 00000000d9c171a4 event REGISTER [ 39.729144] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 39.729145] linux_if_notifier: dev 00000000708c867f event REGISTER [ 39.729148] linux_if_notifier: dev 00000000708c867f event UNREGISTER [ 39.729148] linux_if_notifier: dev 00000000e62ff6ea event REGISTER [ 39.729151] linux_if_notifier: dev 00000000e62ff6ea event UNREGISTER [ 39.729152] linux_if_notifier: dev 0000000082e5b053 event REGISTER [ 39.729155] linux_if_notifier: dev 0000000034e484db event REGISTER [ 39.729158] linux_if_notifier: dev 00000000c1fc6e3d event REGISTER [ 39.977529] linux_if_notifier: dev 0000000082e5b053 event UNREGISTER [ 39.977588] linux_if_notifier: dev 0000000082e5b053 event UNREGISTER [ 40.176831] linux_if_notifier: dev 00000000c1fc6e3d event UNREGISTER [ 40.176876] linux_if_notifier: dev 00000000c1fc6e3d event UNREGISTER [ 40.375398] linux_if_notifier: dev 0000000034e484db event UNREGISTER [ 40.375439] linux_if_notifier: dev 0000000034e484db event UNREGISTER [ 40.378814] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 40.378986] linux_if_notifier: dev 00000000de1e73eb event REGISTER [ 40.383079] Vrouter: vr_interface_get:3548 vif is NULL for vifr_idx: -1 [ 40.383125] Num phy interfaces 1 [ 40.383716] Vrouter: vr_interface_get:3548 vif is NULL for vifr_idx: -1 [ 40.384584] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 40.384599] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 40.385463] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 40.385547] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 40.398649] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 40.398671] linux_if_notifier: dev 00000000de1e73eb event UNREGISTER [ 44.515793] linux_if_notifier: dev 00000000ab576300 event UNREGISTER [ 44.516085] linux_if_notifier: dev 00000000ab576300 event REGISTER [ 44.516125] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 44.516188] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 44.543390] linux_if_notifier: dev 00000000ab576300 event UNREGISTER [ 44.543456] linux_if_notifier: dev 00000000ab576300 event UNREGISTER [ 48.657259] linux_if_notifier: dev 00000000b3df36a1 event UNREGISTER [ 48.657403] linux_if_notifier: dev 00000000b3df36a1 event REGISTER [ 48.657411] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 48.657434] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 48.682969] linux_if_notifier: dev 00000000b3df36a1 event UNREGISTER [ 48.683028] linux_if_notifier: dev 00000000b3df36a1 event UNREGISTER [ 52.801682] linux_if_notifier: dev 000000006b794c44 event UNREGISTER [ 52.801956] linux_if_notifier: dev 000000006b794c44 event REGISTER [ 52.801972] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 52.802030] linux_if_notifier: dev 00000000d9c171a4 event UNREGISTER [ 52.827962] linux_if_notifier: dev 000000006b794c44 event UNREGISTER [ 52.828018] linux_if_notifier: dev 000000006b794c44 event UNREGISTER [ 57.974594] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation [ 57.999789] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 58.003545] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 58.161693] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 58.618324] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue) [ 59.179193] tun: Universal TUN/TAP device driver, 1.6 [ 59.179821] linux_if_notifier: dev 000000005a0a064f event UNREGISTER [ 59.179989] linux_if_notifier: dev 000000005a0a064f event REGISTER [ 59.194303] linux_if_notifier: dev 000000005a0a064f event UNREGISTER [ 59.194435] linux_if_notifier: dev 000000005a0a064f event UNREGISTER [ 59.259446] Config Hugepage vmem 00000000f01948b1 psize 1073741824 mem_sz 20185088 path /dev/hugepages/bridge [ 59.263128] Pinned huge page uspace_vmem 00000000f01948b1 start_page_addr 0000000005a5d439 num 4k pages 4928 mem_size 20185088 file_path /dev/hugepages/bridge [ 59.263130] Config Hugepage vmem 000000006ffa70c9 psize 1073741824 mem_sz 161218560 path /dev/hugepages/flow [ 59.291301] Pinned huge page uspace_vmem 000000006ffa70c9 start_page_addr 000000001b63c890 num 4k pages 39360 mem_size 161218560 file_path /dev/hugepages/flow [ 59.327011] contrail-nodemg[7742]: segfault at 7fb72c0d9802 ip 00007fb72b2638c0 sp 00007ffc7a6de688 error 7 in libc-2.28.so[7fb72b21c000+1bc000] [ 59.327018] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 59.327236] nova-compute[7781]: segfault at 80 ip 00007fd23230471d sp 00007ffedde16a70 error 4 in libpython3.6m.so.1.0[7fd232168000+2ab000] [ 59.327241] Code: 8b 35 87 28 31 00 8b 7e 20 8d 57 01 89 56 20 41 39 16 0f 8c 93 08 00 00 49 39 cb 0f 84 ec 05 00 00 49 39 db 0f 84 13 05 00 00 <49> 8b 8b 80 00 00 00 48 85 c9 0f 84 4d 08 00 00 48 85 ed 0f 88 a3 [ 59.330053] tuned[8456]: segfault at 0 ip 00007f8c02e24b7b sp 00007f8be2eee2e0 error 4 in libpython3.6m.so.1.0[7f8c02d37000+2ab000] [ 59.330064] Code: 41 8d 5c 24 ff 4c 8d 05 03 84 40 00 c1 eb 03 8d 2c 1b 49 8b 34 e8 48 8b 7e 10 48 39 fe 0f 84 bc 01 00 00 4c 8b 56 08 83 06 01 <49> 8b 02 48 89 46 08 48 85 c0 74 39 45 85 ed 75 64 48 8b 74 24 68 [ 59.330207] traps: yum[9080] general protection fault ip:7efe46339780 sp:7ffda54025b8 error:0 in libc-2.28.so[7efe462eb000+1bc000] [ 59.336830] contrail-versio[9076]: segfault at 0 ip 00007f44fc5db3f0 sp 00007ffe44d17f18 error 6 in libc-2.28.so[7f44fc58c000+1bc000] [ 59.336836] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 59.337423] tail[9077]: segfault at 0 ip 00007ff0f47bc350 sp 00007ffca43fe8b8 error 6 in libc-2.28.so[7ff0f476b000+1bc000] [ 59.337428] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 59.337453] sh[9075]: segfault at 0 ip 0000000000000000 sp 00007ffd4de8c698 error 14 in bash[55df1267c000+108000] [ 59.337456] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. [ 59.338664] contrail-vroute[9025]: segfault at 0 ip 000000000093e050 sp 00007fff7b23ae58 error 6 in contrail-vrouter-agent[400000+1a4f000] [ 59.338668] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 59.338729] entrypoint.sh[7836]: segfault at 0 ip 00007fdb6cbcfd50 sp 00007fffbc949ff8 error 6 in libc-2.28.so[7fdb6cb81000+1bc000] [ 59.338735] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 59.342797] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [ 59.348916] ------------[ cut here ]------------ [ 59.348960] tee[8486]: segfault at 0 ip 00007f276de04350 sp 00007ffd9df3c638 error 6 in libc-2.28.so[7f276ddb3000+1bc000] [ 59.348965] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 59.350694] PGD 0 P4D 0 [ 59.355356] kernel BUG at include/linux/fs.h:2943! [ 59.366398] Oops: 0000 [#1] SMP NOPTI [ 59.396401] CPU: 18 PID: 8456 Comm: tuned Kdump: loaded Tainted: G IOE --------- - - 4.18.0-305.30.1.el8_4.x86_64 #1 [ 59.408138] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 07/16/2020 [ 59.416728] RIP: 0010:filemap_fault+0x47/0xa20 [ 59.421204] Code: 04 25 28 00 00 00 48 89 84 24 a0 00 00 00 31 c0 48 8b 07 48 8b 80 a0 00 00 00 4c 8b a0 f0 00 00 00 48 89 44 24 08 4d 8b 34 24 <49> 8b 46 50 48 05 ff 0f 00 00 48 c1 e8 0c 49 39 c5 0f 83 dd 05 00 [ 59.440118] RSP: 0000:ffffbece6331b838 EFLAGS: 00010246 [ 59.445379] RAX: ffff9d703dfb5100 RBX: ffffbece6331b998 RCX: 0000000000000000 [ 59.452564] RDX: 0000000000000000 RSI: 0000000000000100 RDI: ffffbece6331b998 [ 59.459750] RBP: ffff9d8fbdb77b40 R08: 0000000000000000 R09: 0000000001db8093 [ 59.466936] R10: 0000000000000001 R11: 0000000000000100 R12: ffff9d8fbdb77e00 [ 59.474122] R13: 00000000000002b5 R14: 0000000000000000 R15: 0000000000000000 [ 59.481307] FS: 00007f8be2eef700(0000) GS:ffff9d707fa80000(0000) knlGS:0000000000000000 [ 59.489456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 59.495243] CR2: 0000000000000050 CR3: 0000003d7eafe005 CR4: 00000000007706e0 [ 59.502429] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 59.509614] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 59.516800] PKRU: 55555554 [ 59.519524] Call Trace: [ 59.521991] ? _cond_resched+0x15/0x30 [ 59.526376] __xfs_filemap_fault+0x6d/0x200 [xfs] [ 59.531652] __do_fault+0x36/0xd0 [ 59.535515] __handle_mm_fault+0x3d7/0xca0 [ 59.540200] ? xfs_iunlock+0xf3/0x100 [xfs] [ 59.544925] handle_mm_fault+0xc2/0x1d0 [ 59.549296] __get_user_pages+0x2c4/0x840 [ 59.553836] get_dump_page+0x49/0x80 [ 59.557934] elf_core_dump+0x8cf/0xa50 [ 59.562201] do_coredump+0x73b/0xf4e [ 59.566285] ? sched_clock_cpu+0xc/0xb0 [ 59.570627] ? up+0x12/0x50 [ 59.573912] get_signal+0x14f/0x870 [ 59.577888] ? __send_signal+0x343/0x4a0 [ 59.582295] ? page_fault+0x8/0x30 [ 59.586173] do_signal+0x36/0x660 [ 59.589953] ? force_sig_info+0xc7/0xe0 [ 59.594253] ? force_sig_fault+0x59/0x80 [ 59.598633] ? page_fault+0x8/0x30 [ 59.602485] exit_to_usermode_loop+0x89/0xf0 [ 59.607206] prepare_exit_to_usermode+0x9b/0xa0 [ 59.612178] retint_user+0x8/0x8 [ 59.615825] RIP: 0033:0x7f8c02e24b7b [ 59.619813] Code: 41 8d 5c 24 ff 4c 8d 05 03 84 40 00 c1 eb 03 8d 2c 1b 49 8b 34 e8 48 8b 7e 10 48 39 fe 0f 84 bc 01 00 00 4c 8b 56 08 83 06 01 <49> 8b 02 48 89 46 08 48 85 c0 74 39 45 85 ed 75 64 48 8b 74 24 68 [ 59.639498] RSP: 002b:00007f8be2eee2e0 EFLAGS: 00010202 [ 59.645151] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 00007f8c02f53aac [ 59.652730] RDX: 000000000000001c RSI: 00007f8bec5f4000 RDI: 0000000000000000 [ 59.660308] RBP: 0000000000000006 R08: 00007f8c0322cf60 R09: 00007f8beb3ad340 [ 59.667877] R10: 0000000000000000 R11: 00007f8c0326bee8 R12: 000000000000001c [ 59.675434] R13: 0000000000000000 R14: 000000000000001c R15: 0000000000000001 [ 59.682981] Modules linked in: tun overlay vrouter(OE) 8021q garp mrp bonding nf_log_ipv4 nf_log_ipv6 nf_log_common nft_chain_nat nft_limit ipt_MASQUERADE nft_counter xt_LOG xt_limit xt_multiport xt_comment nf_nat xt_state xt_conntrack xt_addrtype nft_compat nf_tables nfnetlink vfat fat dm_service_time dm_multipath dm_mod intel_rapl_msr intel_rapl_common isst_if_common rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate i40iw mlx5_ib ib_uverbs ib_core intel_uncore pcspkr joydev acpi_ipmi hpwdt hpilo lpc_ich mei_me ipmi_si mei ioatdma ipmi_devintf dca wmi ipmi_msghandler acpi_tad acpi_power_meter ip_tables xfs sd_mod t10_pi sg mlx5_core mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm i40e ahci libahci mlxfw [ 59.683015] pci_hyperv_intf libata tls tg3 i2c_algo_bit nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c crc32c_intel fuse br_netfilter bridge stp llc [ 59.789014] CR2: 0000000000000050
- In the above, the kernel ring buffer shows a third-party module,
vrouter
loading and emitting a number of errors. Shortly after loading, a number of processes note segfaults occurring until the system crashed.
- In the above, the kernel ring buffer shows a third-party module,
-
The kernel ring buffer logs indicates the system crashed due to something with file mapping being in an inconsistent state (
kernel BUG at include/linux/fs.h:2943
). As such, the backtrace of where the system crashed can provide a starting point for tracking what the inconsistent state is and possibly how it occurred;crash> bt PID: 8456 TASK: ffff9d6fd9b90000 CPU: 18 COMMAND: "tuned" #0 [ffffbece6331b558] machine_kexec at ffffffffadc6151e #1 [ffffbece6331b5b0] __crash_kexec at ffffffffadd9012d #2 [ffffbece6331b678] crash_kexec at ffffffffadd9101d #3 [ffffbece6331b690] oops_end at ffffffffadc2435d #4 [ffffbece6331b6b0] no_context at ffffffffadc725ff #5 [ffffbece6331b708] __bad_area_nosemaphore at ffffffffadc7295c #6 [ffffbece6331b750] do_page_fault at ffffffffadc73237 #7 [ffffbece6331b780] page_fault at ffffffffae6010fe [exception RIP: filemap_fault+71] RIP: ffffffffade63b37 RSP: ffffbece6331b838 RFLAGS: 00010246 RAX: ffff9d703dfb5100 RBX: ffffbece6331b998 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000100 RDI: ffffbece6331b998 RBP: ffff9d8fbdb77b40 R8: 0000000000000000 R9: 0000000001db8093 R10: 0000000000000001 R11: 0000000000000100 R12: ffff9d8fbdb77e00 R13: 00000000000002b5 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #8 [ffffbece6331b910] __xfs_filemap_fault at ffffffffc086324d [xfs] #9 [ffffbece6331b960] __do_fault at ffffffffade9ab86 #10 [ffffbece6331b980] __handle_mm_fault at ffffffffadea06d7 #11 [ffffbece6331ba38] handle_mm_fault at ffffffffadea1062 #12 [ffffbece6331ba60] __get_user_pages at ffffffffade97dd4 #13 [ffffbece6331bae8] get_dump_page at ffffffffade992c9 #14 [ffffbece6331bb18] elf_core_dump at ffffffffadf834cf #15 [ffffbece6331bcd8] do_coredump at ffffffffadf88e6b #16 [ffffbece6331be08] get_signal at ffffffffadcf2a1f #17 [ffffbece6331be60] do_signal at ffffffffadc1ff46 #18 [ffffbece6331bf28] exit_to_usermode_loop at ffffffffadc03b59 #19 [ffffbece6331bf40] prepare_exit_to_usermode at ffffffffadc040ab RIP: 00007f8c02e24b7b RSP: 00007f8be2eee2e0 RFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 00007f8c02f53aac RDX: 000000000000001c RSI: 00007f8bec5f4000 RDI: 0000000000000000 RBP: 0000000000000006 R8: 00007f8c0322cf60 R9: 00007f8beb3ad340 R10: 0000000000000000 R11: 00007f8c0326bee8 R12: 000000000000001c R13: 0000000000000000 R14: 000000000000001c R15: 0000000000000001 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
-
The backtrace/stack above shows
tuned
received a signal to perform a coredump (likely becausetuned
received a segfault). In attempting to crash, it needed to access files from an XFS-hosted filesystem (likely the parts associated with execution so that they can be dumped), and the system crashed attempting to map in those files (exception RIP: filemap_fault
).crash> dis -r ffffffffade63b37 0xffffffffade63af0 <filemap_fault>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffffade63af5 <filemap_fault+5>: push %r15 0xffffffffade63af7 <filemap_fault+7>: push %r14 0xffffffffade63af9 <filemap_fault+9>: push %r13 0xffffffffade63afb <filemap_fault+11>: push %r12 0xffffffffade63afd <filemap_fault+13>: push %rbp 0xffffffffade63afe <filemap_fault+14>: push %rbx 0xffffffffade63aff <filemap_fault+15>: sub $0xa8,%rsp 0xffffffffade63b06 <filemap_fault+22>: mov 0x10(%rdi),%r13 0xffffffffade63b0a <filemap_fault+26>: mov %gs:0x28,%rax 0xffffffffade63b13 <filemap_fault+35>: mov %rax,0xa0(%rsp) 0xffffffffade63b1b <filemap_fault+43>: xor %eax,%eax 0xffffffffade63b1d <filemap_fault+45>: mov (%rdi),%rax 0xffffffffade63b20 <filemap_fault+48>: mov 0xa0(%rax),%rax 0xffffffffade63b27 <filemap_fault+55>: mov 0xf0(%rax),%r12 0xffffffffade63b2e <filemap_fault+62>: mov %rax,0x8(%rsp) 0xffffffffade63b33 <filemap_fault+67>: mov (%r12),%r14 0xffffffffade63b37 <filemap_fault+71>: mov 0x50(%r14),%rax R12: 000000000000001c R14: 000000000000001c crash> whatis handle_mm_fault vm_fault_t handle_mm_fault(struct vm_area_struct *, unsigned long, unsigned int);
-
fault vma is:
crash> kmem 0xffff9d8e3eac0828 CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff9d51c7c23100 232 6601 53690 1534 8k vm_area_struct SLAB MEMORY NODE TOTAL ALLOCATED FREE fffff0b475fab000 ffff9d8e3eac0000 1 35 16 19 FREE / [ALLOCATED] [ffff9d8e3eac0828] PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffff0b475fab000 3d7eac0000 ffff9d51c7c23100 ffff9d8e3eac0cb0 1 57ffffc0008100 slab,head crash> vm_area_struct 0xffff9d8e3eac0828 struct vm_area_struct { vm_start = 140239324471296, vm_end = 140239324876800, vm_next = 0xffff9d8050ad5d00, vm_prev = 0xffff9d8050ad5220, vm_rb = { __rb_parent_color = 18446635773167751744, rb_right = 0x0, rb_left = 0x0 }, rb_subtree_gap = 0, vm_mm = 0xffff9d90628b1f80, vm_page_prot = { pgprot = 9223372036854775845 }, vm_flags = 135266419, shared = { rb = { __rb_parent_color = 18446635773167752033, rb_right = 0x0, rb_left = 0xffff9d8050ad5278 }, rb_subtree_last = 785 }, anon_vma_chain = { next = 0xffff9d8e0e4f10d0, prev = 0xffff9d8e0e4f10d0 }, anon_vma = 0xffff9d700a5a7790, vm_ops = 0xffffffffc08a47e0 <xfs_file_vm_ops>, vm_pgoff = 687, vm_file = 0xffff9d703dfb5100, vm_private_data = 0x0, swap_readahead_info = { counter = 0 }, vm_policy = 0x0, vm_userfaultfd_ctx = { ctx = 0x0 }, rh_reserved1 = 0, rh_reserved2 = 0, rh_reserved3 = 0, rh_reserved4 = 0 } crash> fsinfo -f 0xffff9d703dfb5100 == File Info == DENTRY INODE SUPERBLK TYPE PATH ffff9d8e6fbb2240 ffff9d8fbdb77c88 0 UNKN file operations = ffffffffc08a46a0 (B) xfs_file_operations [xfs] file open mode = Read-Only (0x8000) file size = 0 bytes, ino = 0, link count = 0 uid = 0, gid = 0 == Mount Info == ffff9d8f5b1aed00 ffff9d71016d9000 xfs /dev/sda3 / ffff9d6d3ea50d80 ffff9d71016d9000 xfs /dev/sda3 storage/overlay 1193 if (IS_DAX(inode)) { 1194 pfn_t pfn; 1195 1196 ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, 1197 (write_fault && !vmf->cow_page) ? 1198 &xfs_direct_write_iomap_ops : 1199 &xfs_read_iomap_ops); 1200 if (ret & VM_FAULT_NEEDDSYNC) 1201 ret = dax_finish_sync_fault(vmf, pe_size, pfn); 1202 } else { 1203 if (write_fault) 1204 ret = iomap_page_mkwrite(vmf, 1205 &xfs_buffered_write_iomap_ops); 1206 else 1207 ret = filemap_fault(vmf); << 1208 } crash> vm_fault 0xffffbece6331b998 -x struct vm_fault { vma = 0xffff9d8e3eac0828, flags = 0x1, gfp_mask = 0xc0, pgoff = 0x2b5, address = 0x7f8c031ec000, pmd = 0xffff9d8e3ea6d0c0, pud = 0xffff9d90624fa180, orig_pte = { pte = 0x0 }, cow_page = 0xfffff0b3f7e91d40, rh_reserved_memcg = 0x0, page = 0x0, pte = 0x0, ptl = 0x0, prealloc_pte = 0x0 } crash> vm_area_struct.vm_file 0xffff9d8e3eac0828 vm_file = 0xffff9d703dfb5100, crash> struct file.f_mapping 0xffff9d703dfb5100 f_mapping = 0xffff9d8fbdb77e00, crash> kmem 0xffff9d8fbdb77e00 CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME ffff9d8f5b013b80 1064 8600 10170 339 32k xfs_inode SLAB MEMORY NODE TOTAL ALLOCATED FREE fffff0b47bf6dc00 ffff9d8fbdb70000 1 30 30 0 FREE / [ALLOCATED] [ffff9d8fbdb77b40] PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffff0b47bf6ddc0 3efdb77000 dead000000000400 0 0 57ffffc0000000 It's all zero'd. crash> address_space 0xffff9d8fbdb77e00 struct address_space { host = 0x0, i_pages = { xa_lock = { { rlock = { raw_lock = { { val = { counter = 0 }, { locked = 0 '\000', pending = 0 '\000' }, { locked_pending = 0, tail = 0 } } } } } }, xa_flags = 0, xa_head = 0x0, xarray_size_rh = 0, _rh = {<No data fields>} }, i_mmap_writable = { counter = 0 }, i_mmap = { rb_root = { rb_node = 0x0 }, rb_leftmost = 0x0 }, i_mmap_rwsem = { count = { counter = 0 }, wait_list = { next = 0x0, prev = 0x0 }, wait_lock = { raw_lock = { { val = { counter = 0 }, { locked = 0 '\000', pending = 0 '\000' }, { locked_pending = 0, tail = 0 } } } }, osq = { tail = { counter = 0 } }, { owner = { counter = 0 }, rh_kabi_hidden_39 = { owner = 0x0 }, {<No data fields>} } }, nrpages = 0, nrexceptional = 0, writeback_index = 0, a_ops = 0x0, flags = 0, private_lock = { { rlock = { raw_lock = { { val = { counter = 0 }, { locked = 0 '\000', pending = 0 '\000' }, { locked_pending = 0, tail = 0 } } } } } }, gfp_mask = 0, private_list = { next = 0x0, prev = 0x0 }, private_data = 0x0, wb_err = 0, rh_reserved1 = 0, rh_reserved2 = 0, rh_reserved3 = 0, rh_reserved4 = 0 } crash> vtop 0xffff9d8fbdb77e00 VIRTUAL PHYSICAL ffff9d8fbdb77e00 3efdb77e00 PGD DIRECTORY: ffffffffaf210000 PAGE DIRECTORY: 2f91a05067 PUD: 2f91a051f0 => 3ecfff1063 PMD: 3ecfff1f68 => 8000003efda001e3 PAGE: 3efda00000 (2MB) PTE PHYSICAL FLAGS 8000003efda001e3 3efda00000 (PRESENT|RW|ACCESSED|DIRTY|PSE|GLOBAL|NX) PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffff0b47bf6ddc0 3efdb77000 dead000000000400 0 0 57ffffc0000000
-
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments