Kernel panic when 3rd party i40e driver is in use

Solution Verified - Updated -

Issue

  • System restarted with memory corruption at random function.
  • Kernel panic with following logs:
[26605.873047] BUG: Bad page state in process vertica  pfn:21393c3
[26605.874108] page:ffffbfcd84e4f0c0 count:-1 mapcount:0 mapping:          (null) index:0x0
[26605.875252] page flags: 0x6fffff00000000()
[26605.876377] page dumped because: nonzero _count
[26605.877504] Modules linked in: binfmt_misc devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter 8021q garp mrp stp llc bonding sunrpc vfat fat xfs ip
mi_ssif libcrc32c skx_edac nfit libnvdimm intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glu
e_helper ablk_helper cryptd pcspkr mei_me ipmi_si sg lpc_ich mei hpilo hpwdt ipmi_devintf wmi ipmi_msghandler acpi_power_meter ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif 
crct10dif_generic uas usb_storage mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common crc32c_intel megaraid_sas i40e(OE) tg3(OE) ptp drm_panel_orientation_quirks pps_core dm_mirror dm_region_hash dm_log dm_mod
[26605.877564] CPU: 49 PID: 335514 Comm: vertica Kdump: loaded Tainted: G    B      OE  ------------   3.10.0-1160.24.1.el7.x86_64 #1
[26605.877566] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 01/23/2021
[26605.877568] Call Trace:
[26605.877572]  <IRQ>  [<ffffffffa3f8308a>] dump_stack+0x19/0x1b
[26605.877593]  [<ffffffffa3f7dc47>] bad_page.part.75+0xdc/0xf9
[26605.877599]  [<ffffffffa39c8845>] get_page_from_freelist+0x7a5/0xac0
[26605.877606]  [<ffffffffa3ea1550>] ? ip_rcv+0x2c0/0x420
[26605.877610]  [<ffffffffa39c8cc6>] __alloc_pages_nodemask+0x166/0x450
[26605.877618]  [<ffffffffa3ecde80>] ? tcp4_gro_receive+0x120/0x1a0
[26605.877636]  [<ffffffffc0166818>] i40e_alloc_rx_buffers+0x178/0x320 [i40e]
[26605.877646]  [<ffffffffc0166f84>] i40e_clean_rx_irq+0x5c4/0xba0 [i40e]
[26605.877656]  [<ffffffffc0167914>] i40e_napi_poll+0x3b4/0x800 [i40e]
[26605.877662]  [<ffffffffa3e571cf>] net_rx_action+0x26f/0x390
[26605.877668]  [<ffffffffa38a4b35>] __do_softirq+0xf5/0x280
[26605.877672]  [<ffffffffa3f994ec>] call_softirq+0x1c/0x30
[26605.877678]  [<ffffffffa382f715>] do_softirq+0x65/0xa0
[26605.877681]  [<ffffffffa38a4eb5>] irq_exit+0x105/0x110
[26605.877684]  [<ffffffffa3f9a936>] do_IRQ+0x56/0xf0
[26605.877689]  [<ffffffffa3f8c36a>] common_interrupt+0x16a/0x16a
[27651.349582] vertica: Corrupted page table at address 7ee2fcf35c60
  • Another pattern of symptom:
Jun 11 12:43:59 localhost kernel: NMI watchdog: BUG: soft lockup - CPU#52 stuck for 22s! [kworker/52:1H:81114]
Jun 11 12:43:59 localhost kernel: Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver cfg80211 rfkill xt_set iptable_raw iptable_mangle ip_set_hash_ip ip_set_hash_net ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_comment veth nvidia_uvm(OE) binfmt_misc tcp_lp xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack br_netfilter bridge stp llc nfsv3 nfs_acl nfs lockd grace fscache overlay(T) bonding sunrpc nvidia_drm(POE) nvidia_modeset(POE) skx_edac nfit libnvdimm intel_powerclamp coretemp intel_rapl iosf_mbi ipmi_ssif nvidia(POE) kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd
Jun 11 12:43:59 localhost kernel: pcspkr ses enclosure sg joydev mei_me lpc_ich hpilo hpwdt mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter sch_fq_codel xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm serio_raw smartpqi(OE) scsi_transport_sas i40e(OE) tg3(OE) ptp drm_panel_orientation_quirks pps_core wmi dm_mirror dm_region_hash dm_log dm_mod fuse ipip tunnel4 ip_tunnel xt_multiport xt_addrtype xt_mark ip_set nfnetlink ip6_tables ip_tables
Jun 11 12:43:59 localhost kernel: CPU: 52 PID: 81114 Comm: kworker/52:1H Kdump: loaded Tainted: P    B      OEL ------------ T 3.10.0-1160.el7.x86_64 #1
Jun 11 12:43:59 localhost kernel: Hardware name: HPE ProLiant XL270d Gen10/ProLiant XL270d Gen10, BIOS U45 01/23/2021
Jun 11 12:43:59 localhost kernel: Workqueue: xprtiod xs_tcp_data_receive_workfn [sunrpc]
Jun 11 12:43:59 localhost kernel: task: ffff9b96d77e6300 ti: ffff9bb438370000 task.ti: ffff9bb438370000
Jun 11 12:43:59 localhost kernel: RIP: 0010:[<ffffffffc06c9cd5>]  [<ffffffffc06c9cd5>] xdr_skb_read_bits+0x5/0x50 [sunrpc]
Jun 11 12:43:59 localhost kernel: RSP: 0018:ffff9bb438373cd8  EFLAGS: 00000202
Jun 11 12:43:59 localhost kernel: RAX: 0000000000000000 RBX: 000000000000584c RCX: 0000000000000004
Jun 11 12:43:59 localhost kernel: RDX: 0000000000000004 RSI: ffff9bb7577b55f8 RDI: ffff9bb438373ce8
Jun 11 12:43:59 localhost kernel: RBP: ffff9bb438373d58 R08: 000000000000584c R09: 0000000100200010
Jun 11 12:43:59 localhost kernel: R10: ffff9b96dc7df400 R11: 00000000000005a8 R12: ffff9b96dc7df400
Jun 11 12:43:59 localhost kernel: R13: 00000000000005a8 R14: 0000000000000010 R15: ffff9bb438373cd0
Jun 11 12:43:59 localhost kernel: FS:  0000000000000000(0000) GS:ffff9bb5ffb00000(0000) knlGS:0000000000000000
Jun 11 12:43:59 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 11 12:43:59 localhost kernel: CR2: 00007ffb516a45c0 CR3: 0000007072810000 CR4: 00000000007607e0
Jun 11 12:43:59 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 11 12:43:59 localhost kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 11 12:43:59 localhost kernel: PKRU: 00000000
Jun 11 12:43:59 localhost kernel: Call Trace:
Jun 11 12:43:59 localhost kernel: [<ffffffffc06cde34>] ? xs_tcp_data_recv+0x1f4/0xad0 [sunrpc]
Jun 11 12:43:59 localhost kernel: [<ffffffffa5e26642>] ? kmem_cache_free+0x1e2/0x200
Jun 11 12:43:59 localhost kernel: [<ffffffffc06cdc40>] ? xs_tcp_setup_socket+0x490/0x490 [sunrpc]
Jun 11 12:43:59 localhost kernel: [<ffffffffa62ad6fb>] tcp_read_sock+0xab/0x1f0
Jun 11 12:43:59 localhost kernel: [<ffffffffc06cac03>] xs_tcp_data_receive_workfn+0xb3/0x140 [sunrpc]
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cbdc4f>] process_one_work+0x17f/0x440
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cbed66>] worker_thread+0x126/0x3c0
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cbec40>] ? manage_workers.isra.26+0x2a0/0x2a0
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cc5c21>] kthread+0xd1/0xe0
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cc5b50>] ? insert_kthread_work+0x40/0x40
Jun 11 12:43:59 localhost kernel: [<ffffffffa6393df7>] ret_from_fork_nospec_begin+0x21/0x21
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cc5b50>] ? insert_kthread_work+0x40/0x40
Jun 11 12:43:59 localhost kernel: [<ffffffffa6393df7>] ret_from_fork_nospec_begin+0x21/0x21
Jun 11 12:43:59 localhost kernel: [<ffffffffa5cc5b50>] ? insert_kthread_work+0x40/0x40
Jun 11 12:43:59 localhost kernel: Code: 48 89 de 48 c7 c7 58 34 6f c0 31 c0 e8 14 0d cb e5 48 89 d8 e9 20 fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 <55> 48 89 e5 41 54 53 48 8b 47 10 48 89 fb 48 39 c2 48 0f 46 c2
  • Another pattern:
[84866.564809] usercopy: kernel memory exposure attempt detected from ffff959c4e6e0902 (mnt_cache) (1448 bytes)
[84866.564850] ------------[ cut here ]------------
[84866.564870] kernel BUG at mm/usercopy.c:72!
[84866.564884] invalid opcode: 0000 [#1] SMP 
[84866.564899] Modules linked in: xt_addrtype br_netfilter xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun devlink ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack bonding ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw nf_conntrack ip_set ebtable_filter ebtables ip6table_filter overlay(T) ip6_tables iptable_filter vfat fat skx_edac nfit libnvdimm intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr ipmi_ssif ses enclosure cdc_eem usbnet mii sg ipmi_si hpilo
[84866.565175]  hpwdt lpc_ich ipmi_devintf ipmi_msghandler mei_me mei wmi acpi_power_meter auth_rpcgss sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common crc32c_intel smartpqi(OE) tg3(OE) i40e(OE) scsi_transport_sas ptp pps_core drm_panel_orientation_quirks uas usb_storage dm_mirror dm_region_hash dm_log dm_mod fuse
[84866.565332] CPU: 36 PID: 14233 Comm: msgr-worker-4 Kdump: loaded Tainted: G    B      OE  ------------ T 3.10.0-1160.31.1.el7.x86_64 #1
[84866.565364] Hardware name: HPE ProLiant XL450 Gen10/ProLiant XL450 Gen10, BIOS U40 01/23/2021
[84866.565388] task: ffff95bc4b6f2100 ti: ffff95bc4aa40000 task.ti: ffff95bc4aa40000
[84866.566283] RIP: 0010:[<ffffffffa624aae7>]  [<ffffffffa624aae7>] __check_object_size+0x87/0x250
[84866.567167] RSP: 0018:ffff95bc4aa43b98  EFLAGS: 00010246
[84866.568012] RAX: 0000000000000060 RBX: ffff959c4e6e0902 RCX: 0000000000000000
[84866.568868] RDX: 0000000000000000 RSI: ffff959c7ff138d8 RDI: ffff959c7ff138d8
[84866.569713] RBP: ffff95bc4aa43bb8 R08: 000000000000000a R09: 0000000000000000
[84866.570548] R10: 0000000000004d39 R11: ffff95bc4aa43896 R12: 00000000000005a8
[84866.571436] R13: 0000000000000001 R14: ffff959c4e6e0eaa R15: 00000000000005a8
[84866.572432] FS:  00007f0ff908e700(0000) GS:ffff959c7ff00000(0000) knlGS:0000000000000000
[84866.573307] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[84866.574167] CR2: 00005651fb3e1900 CR3: 0000003f8bdaa000 CR4: 00000000007607e0
[84866.575008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[84866.575844] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[84866.576717] PKRU: 55555554
[84866.577562] Call Trace:
[84866.578383]  [<ffffffffa639d21d>] memcpy_toiovec+0x4d/0xb0
[84866.579223]  [<ffffffffa6647ac8>] skb_copy_datagram_iovec+0x128/0x280
[84866.580046]  [<ffffffffa66b0cca>] tcp_recvmsg+0x22a/0xb30
[84866.580888]  [<ffffffffa66e0020>] inet_recvmsg+0x80/0xb0
[84866.581696]  [<ffffffffa66356ec>] sock_aio_read.part.9+0x14c/0x170
[84866.582516]  [<ffffffffa6635731>] sock_aio_read+0x21/0x30
[84866.583319]  [<ffffffffa624d4b3>] do_sync_read+0x93/0xe0
[84866.584110]  [<ffffffffa624df95>] vfs_read+0x145/0x170
[84866.584890]  [<ffffffffa624ed6f>] SyS_read+0x7f/0xf0
[84866.585664]  [<ffffffffa6795f92>] system_call_fastpath+0x25/0x2a
[84866.586428] Code: 45 d1 48 c7 c6 64 82 a8 a6 48 c7 c1 de 1b a9 a6 48 0f 45 f1 49 89 c0 4d 89 e1 48 89 d9 48 c7 c7 78 e9 a8 a6 31 c0 e8 01 29 53 00 <0f> 0b 0f 1f 80 00 00 00 00 48 c7 c0 00 00 00 a6 4c 39 f0 73 0d 
[84866.588087] RIP  [<ffffffffa624aae7>] __check_object_size+0x87/0x250
[84866.588847]  RSP <ffff95bc4aa43b98>

Environment

  • Red Hat Enterprise Linux 8.2/8.3
  • Red Hat Enterprise Linux 7.8/7.9
  • i40e 3rd party driver 2.13.10
    • kmod-hp-i40e-2.13.10-1.rhel8u2.x86_64 (srcversion: 1D93A9E4BD7902E9D252F69)
    • kmod-hp-i40e-2.13.10-1.rhel7u9.x86_64 (srcversion: 597EBD96218776AAA546464)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content