enic system panics with memory corruption during NETDEV WATCHDOG transmit queue timeout event

Solution In Progress - Updated -

Issue

  • When an interface using the enic driver experiences a NETDEV WATCHDOG transmit queue timeout it may cause a kernel panic due to some sort of memory corruption.

    ------------[ cut here ]------------
    kernel BUG at mm/slub.c:3752!
    invalid opcode: 0000 [#1] SMP 
    Modules linked in: macsec tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag dm_crypt drbg ansi_cprng binfmt_misc rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_pkttype xt_conntrack ebtable_nat ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter skx_edac intel_powerclamp coretemp
    intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ib_core pcspkr ses enclosure scsi_transport_sas joydev ipmi_ssif sg mei_me lpc_ich mei wmi ipmi_si ipmi_devintf ipmi_msghandler pcc_cpufreq acpi_pad acpi_power_meter nf_conntrack_ftp nf_conntrack ip_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common ahci crc32c_intel drm nvme libahci megaraid_sas libata nvme_core enic drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod
    CPU: 0 PID: 1205449 Comm: kworker/0:1 Kdump: loaded Tainted: G        W      ------------   3.10.0-1062.1.2.el7.x86_64 #1
    Hardware name: Cisco Systems Inc UCSC-C240-M5SX/UCSC-C240-M5SX, BIOS C240M5.3.1.3d.0.0312180914 03/12/2018
    Workqueue: events enic_tx_hang_reset [enic]
    task: ffff8abf24b53150 ti: ffff8adc7075c000 task.ti: ffff8adc7075c000
    RIP: 0010:[<ffffffffb8a22e2c>]  [<ffffffffb8a22e2c>] kfree+0x13c/0x140
    RSP: 0018:ffff8adc7075fce8  EFLAGS: 00010246
    RAX: 006fffff00000000 RBX: ffff8aee959c9000 RCX: 0000000000000001
    RDX: 006fffff00000000 RSI: 0000000000000001 RDI: ffff8aee959c9000
    RBP: ffff8adc7075fd00 R08: 0000000000000000 R09: 0000000180400029
    R10: 00000000cafc9001 R11: ffffcaefff567240 R12: ffff8aee959c92c0
    R13: ffffffffb8e366c5 R14: ffff8ad698658ad8 R15: ffff8ad69842b940
    FS:  0000000000000000(0000) GS:ffff8ad69f800000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f96d3933158 CR3: 0000001ed4610000 CR4: 00000000007607f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    PKRU: 00000000
    Call Trace:
     [<ffffffffb8e366c5>] skb_release_data+0xf5/0x140
     [<ffffffffc045b820>] ? enic_features_check+0x3f0/0x3f0 [enic]
     [<ffffffffb8e36734>] skb_release_all+0x24/0x30
     [<ffffffffb8e36c1c>] consume_skb+0x2c/0x90
     [<ffffffffb8e4bb9d>] __dev_kfree_skb_any+0x3d/0x50
     [<ffffffffc045b88b>] enic_free_wq_buf+0x6b/0x80 [enic]
     [<ffffffffc046056b>] vnic_wq_clean+0x3b/0xb0 [enic]
     [<ffffffffc045afd8>] enic_stop+0x2c8/0x460 [enic]
     [<ffffffffc045eca0>] enic_tx_hang_reset+0x40/0xd0 [enic]
     [<ffffffffb88bd0ff>] process_one_work+0x17f/0x440
     [<ffffffffb88be216>] worker_thread+0x126/0x3c0
     [<ffffffffb88be0f0>] ? manage_workers.isra.26+0x2a0/0x2a0
     [<ffffffffb88c50d1>] kthread+0xd1/0xe0
     [<ffffffffb88c5000>] ? insert_kthread_work+0x40/0x40
     [<ffffffffb8f8cd1d>] ret_from_fork_nospec_begin+0x7/0x21
     [<ffffffffb88c5000>] ? insert_kthread_work+0x40/0x40
    Code: 49 8b 03 31 f6 f6 c4 40 74 04 41 8b 73 68 4c 89 df e8 79 28 fa ff eb 84 4c 8b 58 30 48 8b 10 80 e6 80 4c 0f 44 d8 e9 28 ff ff ff <0f> 0b 66 90 0f 1f 44 00 00 55 89 f1 48 89 e5 41 57 41 56 41 55 
    RIP  [<ffffffffb8a22e2c>] kfree+0x13c/0x140
     RSP <ffff8adc7075fce8>
    

Environment

  • Red Hat Enterprise Linux (RHEL) 8
  • Red Hat Enterprise Linux (RHEL) 7
  • enic NIC hardware

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content