IPMI errors on Nehalem systems with Winbond BMC
Environment
- Red Hat Enterprise Linux 5
- Intel Nehalem CPUs
- Winbond Base-board Management Controller (BMC)
- SuperMicro motherboard
Issue
ipmitool is reporting an error while reporting sensor information:
# ipmitool sdr
CPU1 Temp | 0 unspecified | ok
CPU2 Temp | 0 unspecified | ok
Get SDR 0004 command failed: Unspecified error
CPU2 Temp | disabled | ns
CPU1 Vcore | 0.86 Volts | ok
CPU2 Vcore | 0.87 Volts | ok
+5V | 5.12 Volts | ok
At the same time the following message is sent to /var/log/messages
:
IPMI message handler: BMC returned incorrect response, expected netfn 7 cmd 35, got netfn 5 cmd 2d
Also, the kipmiN
kernel helper threads are generating high CPU load.
Resolution
Occurances of this issue are reported to have disappeared after updating the Winbond BMC's firmware to newer versions available from the motherboard vendor, SuperMicro (at ftp.supermicro.com). Specifically, the following is now used:
# ipmitool mc info
Device ID : 32
Device Revision : 1
Firmware Revision : 1.12
IPMI Version : 2.0
Manufacturer ID : 47488
Manufacturer Name : Unknown (0xB980)
Product ID : 43707 (0xaabb)
Product Name : Unknown (0xAABB)
Device Available : yes
Provides Device SDRs : no
Root Cause
Suspected BMC firmware issue.
Diagnostic Steps
# ipmitool sdr
CPU1 Temp | 0 unspecified | ok
CPU2 Temp | 0 unspecified | ok
Get SDR 0004 command failed: Unspecified error
CPU2 Temp | disabled | ns
CPU1 Vcore | 0.86 Volts | ok
CPU2 Vcore | 0.87 Volts | ok
+5V | 5.12 Volts | ok
Obtain BMC details:
# ipmitool mc info
Device ID : 32
Device Revision : 1
Firmware Revision : 1.9
IPMI Version : 2.0
Manufacturer ID : 47488
Manufacturer Name : Unknown (0xB980)
Product ID : 43707 (0xaabb)
Product Name : Unknown (0xAABB)
After updating to the newer firmware:
Device ID : 32
Device Revision : 1
Firmware Revision : 1.32
IPMI Version : 2.0
Manufacturer ID : 47488
Manufacturer Name : Unknown (0xB980)
Product ID : 43707 (0xaabb)
Product Name : Unknown (0xAABB)
Device Available : yes
Provides Device SDRs : no
Additional Device Support :
Sensor Device
SDR Repository Device
SEL Device
FRU Inventory Device
IPMB Event Receiver
IPMB Event Generator
Chassis Device
Aux Firmware Rev Info :
0x01
0x00
0x00
0x00
Comments
Upstream mailing list thread suggests this may be a hardware, BMC firmware or BIOS issue; the "incorrect response" message indicates that the IPMI device and the IPMI handler code have gone out of synchronisation.
Refer to kipmi kernel helper thread kipmi0 is generating high CPU load for general information about kipmiN kernel helper threads generating high CPU load.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments