Firmware Fails to Load for One Device, But Loads Cleanly for the Next Device on System Startup

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 6
  • udev versions earlier than 147-2.51.el6

Issue

  • During startup there is an error generated in the log files about failing to load firmware for eth0
  • Later in the startup process its sets up eth1 using the same firmware without issue.
kernel: power_meter ACPI000D:00: Found ACPI power meter.
kernel: bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July 20, 2011)
kernel: bnx2 0000:0f:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ 28
kernel: bnx2 0000:0f:00.0: firmware: requesting bnx2/bnx2-mips-06-6.2.1.fw
kernel: bnx2: Can't load firmware file "bnx2/bnx2-mips-06-6.2.1.fw"
kernel: bnx2 0000:0f:00.0: PCI INT A disabled
kernel: bnx2: probe of 0000:0f:00.0 failed with error -2
kernel: power_meter ACPI000D:00: Found ACPI power meter.
kernel: bnx2: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.1.11 (July 20, 2011)
kernel: bnx2 0000:12:00.0: PCI INT A -> GSI 28 (level, low) -> IRQ 28
kernel: bnx2 0000:12:00.0: firmware: requesting bnx2/bnx2-mips-06-6.2.1.fw
kernel: bnx2 0000:12:00.0: firmware: requesting bnx2/bnx2-rv2p-06-6.0.15.fw
kernel: bnx2 0000:12:00.0: eth1: Broadcom NetXtreme II BCM5708 1000Base-SX (B2) PCI-X 64-bit 133MHz found at mem 92000000, IRQ 28, node addr e8:39:35:25:e5:f5

Resolution

  • If the number of attached storage is not so big, this issue could be resolved by updating udev to version from RHBA-2013-1675.

  • If the number of storage devices is large, it is possible that a lot of modprobes are triggered. As a consequence udev can reach the limit of worker processes and not react to firmware load requests, ultimately timing out before the firmware is loaded. A udev update to resolve this was released in RHBA-2014-1524

Workaround 1
If updating udev is not an option a workaround exists: install dracut-network and rebuild the initramfs with dracut. This will put all network modules into initramfs so that they will initialised before storage.

Workaround 2

1) confirm below packages are installed and install if not installed

kmod-bnx2.x86_64
kmod-bnx2-firmware.x86_64

2) Run lsmod | grep bnx2

3) Edit /etc/dracut.conf and include the following line:

add_drivers+="bnx2"

4) Run 'dracut -f'

5) Reboot

Root Cause

For the first problem there is an explanation exists:

You could experiment with "udevchilds=". With 51200MB RAM, udevd uses 51200/8 + 128 = 6528 child processes to work in parallel on the udev event queue. With that much storage and LVM in the game, I would guess this number exponentially grows.

So, one thing what you could try is to limit the number with for example "udevchilds=400" added to the kernel command line.

For the second situation the problem is more complex:

The failure is a result of a deadlock between using udev to load firmware, and udev's serialization when calling modprobe. In udev_event_execute_run(), udev_lock_down() is used to serialize RUN modprobe commands. One udev worker is running modprobe, and all others are waiting on the lock to run modprobe, resulting in udev not being able to get to the firmware request. The deadlock only ends when a timeout is reached, but still leaves any module such as bnx2 needing firmware unusable, unless manually loaded later after the SCSI udev rules have all finished.

Diagnostic Steps

First step is to define the actual timeout and root cause:

  1. put printk.time=1 log_buf_len=4M to kernel boot parameters, estimate the delay:
[   42.993561] bnx2 0000:01:00.0: PCI INT A -> GSI 36 (level, low) -> IRQ 36
[   42.993570] bnx2 0000:01:00.0: setting latency timer to 64
[   42.994558] bnx2 0000:01:00.0: firmware: requesting bnx2/bnx2-mips-09-6.2.1b.fw
[  102.960752] bnx2: Can't load firmware file "bnx2/bnx2-mips-09-6.2.1b.fw"
[  102.960865] bnx2 0000:01:00.0: PCI INT A disabled
[  102.960877] bnx2: probe of 0000:01:00.0 failed with error -2
  1. enable debugging in udev:

Using udevinfo and udevlog parameters to debug udev, we could see one of the udev workers trying to request the bnx2 module, and timing out.

udevd-work[14375]: seq 14279 running
udevd-work[14375]: RUN '/sbin/modprobe -b $env{MODALIAS}' /lib/udev/rules.d/80-drivers.rules:5
udevd-work[14375]: RUN 'socket:@/org/freedesktop/hal/udev_event' /etc/udev/rules.d/90-hal.rules:2
...
udevd-work[14375]: '/sbin/modprobe -b pci:v000014E4d00001639sv00001028sd00001F26bc02sc00i00' started
udevd[14363]: worker [14375] exit
udevd[14363]: worker [14375] unexpectedly returned with status 0x0100
udevd[14363]: worker [14375] failed while handling '/devices/pci0000:00/0000:00:03.0/0000:02:00.0'

In the time between the RUN being prepped and modprobe actually being started, all the other udev workers also wanted to run modprobe, specifically for the sg driver. for example:

udevd-work[14405]: RUN '/sbin/modprobe -b sg' /lib/udev/rules.d/80-drivers.rules:10

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments