i40e panic on link loss or after changing NIC ring buffer count

Solution Verified - Updated -

Issue

  • System with Intel 700 series i40e NIC suffered error:
[3408318.640794] i40e 0000:03:00.0 ens64: NIC Link is Down
[3408319.378069] i40e 0000:03:00.0 ens64: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
[3408332.303305] i40e 0000:03:00.0 ens64: NIC Link is Down
[3408338.343124] i40e 0000:03:00.0 ens64: Changing Tx descriptor count from 8160 to 512.
[3408338.344477] i40e 0000:03:00.0 ens64: Changing Rx descriptor count from 8160 to 512
[3408761.504733] i40e 0000:03:00.0 ens64: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
[3408761.543703] i40e 0000:03:00.0 ens64: Changing Tx descriptor count from 512 to 8160.
[3408761.554146] i40e 0000:03:00.0 ens64: Changing Rx descriptor count from 512 to 8160

[3408761.583181] NetworkManager: page allocation failure: order:6, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO)
[3408761.583595] CPU: 7 PID: 1428 Comm: NetworkManager Kdump: loaded Not tainted 5.14.0-570.28.1.el9_6.x86_64 #1

[3408761.584360] Call Trace:
[3408761.584734]  <TASK>
[3408761.585105]  dump_stack_lvl+0x34/0x48
[3408761.585481]  warn_alloc+0x129/0x150
[3408761.585854]  ? __alloc_pages_direct_compact+0xa7/0x210
[3408761.586220]  __alloc_pages_slowpath.constprop.0+0xa73/0xb20
[3408761.586580]  ? get_page_from_freelist+0x2aa/0x590
[3408761.586936]  __alloc_pages+0x21d/0x250
[3408761.587287]  __kmalloc_large_node+0x79/0x110
[3408761.587636]  __kmalloc+0x322/0x440
[3408761.587982]  ? i40e_setup_rx_descriptors+0x91/0xb0 [i40e]
[3408761.588363]  ? i40e_setup_rx_descriptors+0x91/0xb0 [i40e]
[3408761.588717]  i40e_setup_rx_descriptors+0x91/0xb0 [i40e]
[3408761.589075]  i40e_set_ringparam.cold+0x11d/0x56e [i40e]
[3408761.589425]  __dev_ethtool+0x8fc/0x1b40
[3408761.589752]  ? __sk_destruct+0x155/0x230
[3408761.590073]  ? kmem_cache_free+0x3f1/0x420
[3408761.590390]  ? kmalloc_trace+0x176/0x330
[3408761.590704]  dev_ethtool+0xa8/0x170
[3408761.591013]  dev_ioctl+0x1b5/0x580
[3408761.591317]  ? sk_ioctl+0x4a/0x110
[3408761.591616]  sock_do_ioctl+0xab/0xf0
[3408761.591919]  sock_ioctl+0x1ce/0x2e0
[3408761.592210]  ? auditd_test_task+0x3c/0x50
[3408761.592498]  ? __audit_syscall_entry+0xef/0x140
[3408761.592782]  __x64_sys_ioctl+0x87/0xc0
[3408761.593062]  do_syscall_64+0x5c/0xe0
[3408761.593336]  ? __irq_exit_rcu+0x46/0xc0
[3408761.593608]  ? common_interrupt+0x43/0xa0
[3408761.593873]  entry_SYSCALL_64_after_hwframe+0x78/0x80
  • The system panicked shortly after:
[3408762.143799] BUG: kernel NULL pointer dereference, address: 0000000000000008
[3408762.143802] #PF: supervisor write access in kernel mode
[3408762.143803] #PF: error_code(0x0002) - not-present page
[3408762.143804] PGD 0 P4D 0 
[3408762.143806] Oops: 0002 [#1] PREEMPT SMP NOPTI
[3408762.143811] RIP: 0010:i40e_xmit_frame_ring+0xff/0x500 [i40e]

[3408762.143878] Call Trace:
[3408762.143880]  <TASK>
[3408762.143881]  ? show_trace_log_lvl+0x1c4/0x2df
[3408762.143885]  ? show_trace_log_lvl+0x1c4/0x2df
[3408762.143887]  ? dev_hard_start_xmit+0x85/0x1d0
[3408762.143890]  ? __die_body.cold+0x8/0xd
[3408762.143892]  ? page_fault_oops+0x134/0x170
[3408762.143895]  ? _copy_to_iter+0x61/0x570
[3408762.143900]  ? exc_page_fault+0x62/0x150
[3408762.143903]  ? asm_exc_page_fault+0x22/0x30
[3408762.143907]  ? i40e_xmit_frame_ring+0xff/0x500 [i40e]
[3408762.143928]  dev_hard_start_xmit+0x85/0x1d0
[3408762.143930]  sch_direct_xmit+0x9b/0x360
[3408762.143934]  __dev_xmit_skb+0x22a/0x570
[3408762.143937]  __dev_queue_xmit+0x2c2/0x6b0
[3408762.143939]  ? packet_parse_headers+0x107/0x220
[3408762.143942]  ? packet_parse_headers+0x107/0x220
[3408762.143944]  packet_snd+0x382/0x760
[3408762.143946]  __sys_sendto+0x1dc/0x1f0
[3408762.143950]  ? syscall_exit_to_user_mode+0x19/0x40
[3408762.143953]  ? auditd_test_task+0x3c/0x50
[3408762.143956]  ? __audit_syscall_entry+0xef/0x140
[3408762.143959]  __x64_sys_sendto+0x20/0x30
[3408762.143961]  do_syscall_64+0x5c/0xe0
[3408762.143963]  ? audit_reset_context.part.0.constprop.0+0x273/0x2e0
[3408762.143965]  ? syscall_exit_work+0x103/0x130
[3408762.143967]  ? syscall_exit_to_user_mode+0x19/0x40
[3408762.143970]  ? do_syscall_64+0x6b/0xe0
[3408762.143971]  ? syscall_exit_work+0x103/0x130
[3408762.143972]  ? syscall_exit_to_user_mode+0x19/0x40
[3408762.143975]  ? do_syscall_64+0x6b/0xe0
[3408762.143976]  entry_SYSCALL_64_after_hwframe+0x78/0x80

Environment

  • Red Hat Enterprise Linux 9
  • Intel 700 series NIC with i40e driver
  • NetworkManager managing NIC ring buffer with ethtool.ring-rx and ethtool.ring-tx connection property

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content