Kernel panic when 3rd party i40e driver is in use
Issue
- System restarted with memory corruption at random function.
- Kernel panic with following logs:
[26605.873047] BUG: Bad page state in process vertica pfn:21393c3
[26605.874108] page:ffffbfcd84e4f0c0 count:-1 mapcount:0 mapping: (null) index:0x0
[26605.875252] page flags: 0x6fffff00000000()
[26605.876377] page dumped because: nonzero _count
[26605.877504] Modules linked in: binfmt_misc devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 8021q garp mrp stp llc bonding sunrpc vfat fat xfs ip
mi_ssif libcrc32c skx_edac nfit libnvdimm intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glu
e_helper ablk_helper cryptd pcspkr mei_me ipmi_si sg lpc_ich mei hpilo hpwdt ipmi_devintf wmi ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif
crct10dif_generic uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common crc32c_intel megaraid_sas i40e(OE) tg3(OE) ptp drm_panel_orientation_quirks pps_core dm_mirror dm_region_hash dm_log dm_mod
[26605.877564] CPU: 49 PID: 335514 Comm: vertica Kdump: loaded Tainted: G B OE ------------ 3.10.0-1160.24.1.el7.x86_64 #1
[26605.877566] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 01/23/2021
[26605.877568] Call Trace:
[26605.877572] <IRQ> [<ffffffffa3f8308a>] dump_stack+0x19/0x1b
[26605.877593] [<ffffffffa3f7dc47>] bad_page.part.75+0xdc/0xf9
[26605.877599] [<ffffffffa39c8845>] get_page_from_freelist+0x7a5/0xac0
[26605.877606] [<ffffffffa3ea1550>] ? ip_rcv+0x2c0/0x420
[26605.877610] [<ffffffffa39c8cc6>] __alloc_pages_nodemask+0x166/0x450
[26605.877618] [<ffffffffa3ecde80>] ? tcp4_gro_receive+0x120/0x1a0
[26605.877636] [<ffffffffc0166818>] i40e_alloc_rx_buffers+0x178/0x320 [i40e]
[26605.877646] [<ffffffffc0166f84>] i40e_clean_rx_irq+0x5c4/0xba0 [i40e]
[26605.877656] [<ffffffffc0167914>] i40e_napi_poll+0x3b4/0x800 [i40e]
[26605.877662] [<ffffffffa3e571cf>] net_rx_action+0x26f/0x390
[26605.877668] [<ffffffffa38a4b35>] __do_softirq+0xf5/0x280
[26605.877672] [<ffffffffa3f994ec>] call_softirq+0x1c/0x30
[26605.877678] [<ffffffffa382f715>] do_softirq+0x65/0xa0
[26605.877681] [<ffffffffa38a4eb5>] irq_exit+0x105/0x110
[26605.877684] [<ffffffffa3f9a936>] do_IRQ+0x56/0xf0
[26605.877689] [<ffffffffa3f8c36a>] common_interrupt+0x16a/0x16a
[27651.349582] vertica: Corrupted page table at address 7ee2fcf35c60
- Another pattern of symptom:
Jun 11 12:43:59 localhost kernel: NMI watchdog: BUG: soft lockup - CPU#52 stuck for 22s! [kworker/52:1H:81114]
Jun 11 12:43:59 localhost kernel: Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver cfg80211 rfkill xt_set iptable_raw iptable_mangle ip_set_hash_ip ip_set_hash_net ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_comment veth nvidia_uvm(OE) binfmt_misc tcp_lp xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter bridge stp llc nfsv3 nfs_acl nfs lockd grace fscache overlay(T) bonding sunrpc nvidia_drm(POE) nvidia_modeset(POE) skx_edac nfit libnvdimm intel_powerclamp coretemp intel_rapl iosf_mbi ipmi_ssif nvidia(POE) kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd
Jun 11 12:43:59 localhost kernel: pcspkr ses enclosure sg joydev mei_me lpc_ich hpilo hpwdt mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter sch_fq_codel xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm serio_raw smartpqi(OE) scsi_transport_sas i40e(OE) tg3(OE) ptp drm_panel_orientation_quirks pps_core wmi dm_mirror dm_region_hash dm_log dm_mod fuse ipip tunnel4 ip_tunnel xt_multiport xt_addrtype xt_mark ip_set nfnetlink ip6_tables ip_tables
Jun 11 12:43:59 localhost kernel: CPU: 52 PID: 81114 Comm: kworker/52:1H Kdump: loaded Tainted: P B OEL ------------ T 3.10.0-1160.el7.x86_64 #1
Jun 11 12:43:59 localhost kernel: Hardware name: HPE ProLiant XL270d Gen10/ProLiant XL270d Gen10, BIOS U45 01/23/2021
Jun 11 12:43:59 localhost kernel: Workqueue: xprtiod xs_tcp_data_receive_workfn [sunrpc]
Jun 11 12:43:59 localhost kernel: task: ffff9b96d77e6300 ti: ffff9bb438370000 task.ti: ffff9bb438370000
Jun 11 12:43:59 localhost kernel: RIP: 0010:[<ffffffffc06c9cd5>] [<ffffffffc06c9cd5>] xdr_skb_read_bits+0x5/0x50 [sunrpc]
Jun 11 12:43:59 localhost kernel: RSP: 0018:ffff9bb438373cd8 EFLAGS: 00000202
Jun 11 12:43:59 localhost kernel: RAX: 0000000000000000 RBX: 000000000000584c RCX: 0000000000000004
Jun 11 12:43:59 localhost kernel: RDX: 0000000000000004 RSI: ffff9bb7577b55f8 RDI: ffff9bb438373ce8
Jun 11 12:43:59 localhost kernel: RBP: ffff9bb438373d58 R08: 000000000000584c R09: 0000000100200010
Jun 11 12:43:59 localhost kernel: R10: ffff9b96dc7df400 R11: 00000000000005a8 R12: ffff9b96dc7df400
Jun 11 12:43:59 localhost kernel: R13: 00000000000005a8 R14: 0000000000000010 R15: ffff9bb438373cd0
Jun 11 12:43:59 localhost kernel: FS: 0000000000000000(0000) GS:ffff9bb5ffb00000(0000) knlGS:0000000000000000
Jun 11 12:43:59 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 11 12:43:59 localhost kernel: CR2: 00007ffb516a45c0 CR3: 0000007072810000 CR4: 00000000007607e0
Jun 11 12:43:59 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 11 12:43:59 localhost kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 11 12:43:59 localhost kernel: PKRU: 00000000
Jun 11 12:43:59 localhost kernel: Call Trace:
Jun 11 12:43:59 localhost kernel: [<ffffffffc06cde34>] ? xs_tcp_data_recv+0x1f4/0xad0 [sunrpc]
Jun 11 12:43:59 localhost kernel: [<ffffffffa5e26642>] ? kmem_cache_free+0x1e2/0x200
Jun 11 12:43:59 localhost kernel: [<ffffffffc06cdc40>] ? xs_tcp_setup_socket+0x490/0x490 [sunrpc]
Jun 11 12:43:59 localhost kernel: [<ffffffffa62ad6fb>] tcp_read_sock+0xab/0x1f0
Jun 11 12:43:59 localhost kernel: [<ffffffffc06cac03>] xs_tcp_data_receive_workfn+0xb3/0x140 [sunrpc]
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cbdc4f>] process_one_work+0x17f/0x440
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cbed66>] worker_thread+0x126/0x3c0
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cbec40>] ? manage_workers.isra.26+0x2a0/0x2a0
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cc5c21>] kthread+0xd1/0xe0
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cc5b50>] ? insert_kthread_work+0x40/0x40
Jun 11 12:43:59 localhost kernel: [<ffffffffa6393df7>] ret_from_fork_nospec_begin+0x21/0x21
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cc5b50>] ? insert_kthread_work+0x40/0x40
Jun 11 12:43:59 localhost kernel: [<ffffffffa6393df7>] ret_from_fork_nospec_begin+0x21/0x21
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cc5b50>] ? insert_kthread_work+0x40/0x40
Jun 11 12:43:59 localhost kernel: Code: 48 89 de 48 c7 c7 58 34 6f c0 31 c0 e8 14 0d cb e5 48 89 d8 e9 20 fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 <55> 48 89 e5 41 54 53 48 8b 47 10 48 89 fb 48 39 c2 48 0f 46 c2
- Another pattern:
[84866.564809] usercopy: kernel memory exposure attempt detected from ffff959c4e6e0902 (mnt_cache) (1448 bytes)
[84866.564850] ------------[ cut here ]------------
[84866.564870] kernel BUG at mm/usercopy.c:72!
[84866.564884] invalid opcode: 0000 [#1] SMP
[84866.564899] Modules linked in: xt_addrtype br_netfilter xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack bonding ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack ip_set ebtable_filter ebtables ip6table_filter overlay(T) ip6_tables iptable_filter vfat fat skx_edac nfit libnvdimm intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ipmi_ssif ses enclosure cdc_eem usbnet mii sg ipmi_si hpilo
[84866.565175] hpwdt lpc_ich ipmi_devintf ipmi_msghandler mei_me mei wmi acpi_power_meter auth_rpcgss sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common crc32c_intel smartpqi(OE) tg3(OE) i40e(OE) scsi_transport_sas ptp pps_core drm_panel_orientation_quirks uas usb_storage dm_mirror dm_region_hash dm_log dm_mod fuse
[84866.565332] CPU: 36 PID: 14233 Comm: msgr-worker-4 Kdump: loaded Tainted: G B OE ------------ T 3.10.0-1160.31.1.el7.x86_64 #1
[84866.565364] Hardware name: HPE ProLiant XL450 Gen10/ProLiant XL450 Gen10, BIOS U40 01/23/2021
[84866.565388] task: ffff95bc4b6f2100 ti: ffff95bc4aa40000 task.ti: ffff95bc4aa40000
[84866.566283] RIP: 0010:[<ffffffffa624aae7>] [<ffffffffa624aae7>] __check_object_size+0x87/0x250
[84866.567167] RSP: 0018:ffff95bc4aa43b98 EFLAGS: 00010246
[84866.568012] RAX: 0000000000000060 RBX: ffff959c4e6e0902 RCX: 0000000000000000
[84866.568868] RDX: 0000000000000000 RSI: ffff959c7ff138d8 RDI: ffff959c7ff138d8
[84866.569713] RBP: ffff95bc4aa43bb8 R08: 000000000000000a R09: 0000000000000000
[84866.570548] R10: 0000000000004d39 R11: ffff95bc4aa43896 R12: 00000000000005a8
[84866.571436] R13: 0000000000000001 R14: ffff959c4e6e0eaa R15: 00000000000005a8
[84866.572432] FS: 00007f0ff908e700(0000) GS:ffff959c7ff00000(0000) knlGS:0000000000000000
[84866.573307] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[84866.574167] CR2: 00005651fb3e1900 CR3: 0000003f8bdaa000 CR4: 00000000007607e0
[84866.575008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[84866.575844] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[84866.576717] PKRU: 55555554
[84866.577562] Call Trace:
[84866.578383] [<ffffffffa639d21d>] memcpy_toiovec+0x4d/0xb0
[84866.579223] [<ffffffffa6647ac8>] skb_copy_datagram_iovec+0x128/0x280
[84866.580046] [<ffffffffa66b0cca>] tcp_recvmsg+0x22a/0xb30
[84866.580888] [<ffffffffa66e0020>] inet_recvmsg+0x80/0xb0
[84866.581696] [<ffffffffa66356ec>] sock_aio_read.part.9+0x14c/0x170
[84866.582516] [<ffffffffa6635731>] sock_aio_read+0x21/0x30
[84866.583319] [<ffffffffa624d4b3>] do_sync_read+0x93/0xe0
[84866.584110] [<ffffffffa624df95>] vfs_read+0x145/0x170
[84866.584890] [<ffffffffa624ed6f>] SyS_read+0x7f/0xf0
[84866.585664] [<ffffffffa6795f92>] system_call_fastpath+0x25/0x2a
[84866.586428] Code: 45 d1 48 c7 c6 64 82 a8 a6 48 c7 c1 de 1b a9 a6 48 0f 45 f1 49 89 c0 4d 89 e1 48 89 d9 48 c7 c7 78 e9 a8 a6 31 c0 e8 01 29 53 00 <0f> 0b 0f 1f 80 00 00 00 00 48 c7 c0 00 00 00 a6 4c 39 f0 73 0d
[84866.588087] RIP [<ffffffffa624aae7>] __check_object_size+0x87/0x250
[84866.588847] RSP <ffff95bc4aa43b98>
Environment
- Red Hat Enterprise Linux 8.2/8.3
- Red Hat Enterprise Linux 7.8/7.9
- i40e 3rd party driver
2.13.10
- kmod-hp-i40e-2.13.10-1.rhel8u2.x86_64 (srcversion: 1D93A9E4BD7902E9D252F69)
- kmod-hp-i40e-2.13.10-1.rhel7u9.x86_64 (srcversion: 597EBD96218776AAA546464)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.