Kernel panic during boot on a system with 256 CPUs and an Intel E810 series adapter

Solution Verified - Updated -

Environment

  • System with 256 CPUs (2 × 64-core AMD EPYC processors with SMT enabled, for example)
  • Intel E810 series ("Columbiaville") network adapter
  • RHEL 8.3 GA or RHEL 8.2 with ice DUP

Issue

  • Kernel panic during boot on a system with 256 CPUs and an Intel E810 series adapter
  • Error messages "kernel BUG at kernel/time/timer.c:964!" and "failed to get 256 MSI-X vectors" in system log.

Resolution

This issue will be resolved in a future update to Red Hat Enterprise Linux 8

Workaround

As a workaround customers can boot with the following kernel parameter: nr_cpu=255 (or some other value less than 256); another option is to disable SMT in system's firmware or via nosmt kernel parameter.

Root Cause

Missing kernel commit 135f4b9e9340 ("ice: fix memory leak if register_netdev_fails")

Diagnostic Steps

A kernel panic with the following trace is observed when ice kernel module is loaded (during boot, for example):

[   21.716742] kernel BUG at kernel/time/timer.c:964!
[   21.728933] invalid opcode: 0000 [#1] SMP NOPTI
[   21.741065] CPU: 32 PID: 1986 Comm: kworker/32:3 Tainted: G        W        --------- -  - 4.18.0-234.el8.x86_64 #1
[   21.775962] Workqueue: events work_for_cpu_fn
[   21.787924] RIP: 0010:add_timer+0x183/0x1f0
[   21.963286] Call Trace:
[   21.971911]  queue_delayed_work_on+0x36/0x40
[   21.982294]  ice_devlink_destroy_port+0x12/0x20 [ice]
[   21.993496]  ice_probe+0x93b/0x1100 [ice]
[   22.003477]  local_pci_probe+0x41/0x90
[   22.013906]  work_for_cpu_fn+0x16/0x20
[   22.024373]  process_one_work+0x1a7/0x360
[   22.035016]  worker_thread+0x1cf/0x390
[   22.045247]  ? create_worker+0x1a0/0x1a0
[   22.054817]  kthread+0x112/0x130
[   22.063675]  ? kthread_flush_work_fn+0x10/0x10
[   22.073671]  ret_from_fork+0x22/0x40

The following messages are observed before that:

[   20.187939] ice 0000:01:00.0: not enough OS MSI-X vectors. requested = 258, obtained = 256
[   20.203763] ice 0000:01:00.0: 255 MSI-X interrupts available. ICE_VSI_PF 0 failed to get 256 MSI-X vectors
[   20.221245] ice 0000:01:00.0: probe failed due to setup PF switch: -12
   20.233270] WARNING: CPU: 32 PID: 1986 at net/core/devlink.c:7336 __devlink_port_type_set+0x60/0x70
[   20.329108] CPU: 32 PID: 1986 Comm: kworker/32:3 Not tainted 4.18.0-234.el8.x86_64 #1
[   20.359189] Workqueue: events work_for_cpu_fn
[   20.370938] RIP: 0010:__devlink_port_type_set+0x60/0x70
[   20.555250] Call Trace:
[   20.565422]  devlink_port_type_clear+0x12/0x40
[   20.577769]  ice_devlink_destroy_port+0x12/0x20 [ice]
[   20.590578]  ice_probe+0x93b/0x1100 [ice]
[   20.602332]  local_pci_probe+0x41/0x90
[   20.613709]  work_for_cpu_fn+0x16/0x20
[   20.625790]  process_one_work+0x1a7/0x360
[   20.638148]  worker_thread+0x1cf/0x390
[   20.649558]  ? create_worker+0x1a0/0x1a0
[   20.661034]  kthread+0x112/0x130
[   20.671720]  ? kthread_flush_work_fn+0x10/0x10
[   20.683614]  ret_from_fork+0x22/0x40
[   20.694518] ---[ end trace 7fbba83dd7965bb6 ]---

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments