kipmi kernel helper thread kipmi0 is generating high CPU load
Environment
- Red Hat Enterprise Linux 4
- Red Hat Enterprise Linux 5
- Red Hat Enterprise Linux 6
- Out-of-band management hardware that supports the Intelligent Platform Management Interface (IPMI, a systems health and management standard supported by multiple vendors)
Issue
- The
kipmi0
kernel helper thread sometimes goes to 100% CPU usage. Once there, it remains at 100% until the next reboot. After a reboot, things return to normal and then, at a random time later, it goes to 100% again.
Resolution
Proper resolution of this issue requires a fix at the hardware/firmware level.
- Identify the BIOS version installed on the affected system through
# dmidecode
- Identify the Baseboard Management Controller (BMC) firmware version through
# ipmitool mc info
- Check the hardware vendor's support site for newer BIOS and BMC firmware versions and update to them, if possible.
- If the issue persists with the current BIOS and BMC firmware versions, contact the hardware vendor's support organization for assistance.
Workaround for RHEL 6
Since the ipmi_si
module was built in to the kernel in RHEL6, the following can be appended to the end of the kernel line in /etc/grub.conf:
ipmi_si.kipmid_max_busy_us=<time in microseconds>
The kipmid_max_busy_us
option sets the maximum amount of time, in microseconds, that kipmid
will spin before sleeping for a tick. This value sets a balance between performance and CPU waste and needs to be tuned to your needs.
Unfortunately there is no "catch-all" value that can be recommended here. Test and iteration is the best way to go about determining the best value for the environment.
Preferred Workaround for RHEL 4 and 5
As of RHEL 5.6, kernel-2.6.18-238, a new module parameter kipmid_max_busy_us
is available which sets the maximum amount of time, in microseconds, that kipmid
will spin before sleeping for a tick. This value sets a balance between performance and CPU waste and needs to be tuned to your needs.
Alternative Workaround for RHEL 4 and 5
The kipmiN
helper threads can be disabled by supplying the option force_kipmid=0
to the ipmi_si
module at the cost of a (possibly major) slowdown of IPMI operations:
Edit /etc/modprobe.conf
and add the following entry:
options ipmi_si force_kipmid=0
Root Cause
- kipmi0 is a kernel helper process/thread involved in handling IPMI interfaces. Within IPMI, there are several standard classes of interfaces. Some of these classes, like KCS (Keyboard Control Style) and SMIC (System Management Interface Chip) do not use interrupt requests (IRQs) to signal changes, and thus require polling to obtain command results. The
kipmiN
kernel helper threads perform this polling. Thus, it is normal for these threads to consume significant CPU time while an IPMI operation is in progress. - In this case, there is a problem in the interaction between the driver and the hardware/firmware which leads the driver to believe that an operation is still in progress, causing the high CPU load to continue until the system is rebooted.
- As the
kipmiN
kernel helper threads are executed at low priority (so as not to hog the CPU), this should not cause problems under normal usage scenarios. As the Linux kernel implements IPMI support in terms of the interfaces classes rather than in terms of individual IPMI chipsets, this issue cannot be worked around by the kernel effectively and should be addressed by the hardware vendor instead. - Refer to
Documentation/IPMI.txt
in the kernel sources. This behavior has also been discussed on Dell's linux-poweredge mailing list.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
2 Comments
Posible workarounds
While the only permanent fix is a firmware / hardware fix, there are a number of workarounds that could be used.
The first thing to try is a remote cold reset of the BMC. In some situations this can bring the
kipmiN
kernel helper thread to normal CPU usage levels, re-enabling local IPMI access.In case
ipmi_si
is built as a separate module unloading and reloading the module should fix the problem as well.The
ipmi_si
driver supports hot adding and removing of IPMI interfaces. This way, interfaces can be added or removed after the kernel is up and running, even whenipmi_si
is built directly into the kernel. This is done using/sys/modules/ipmi_si/parameters/hotmod
, which is a write-only parameter.The workflow is the following:
Get the parameters used to create the current IPMI interface. In case there is only one IPMI interface per system (the most common case), you will find the parameters under
/proc/ipmi/0/params
. The parameters should look similar to this:kcs,i/o,0xca2,rsp=1,rsi=1,rsh=0,irq=0,ipmb=32
.Using the driver's hotmod interface trigger the removal of the current IPMI interface, by writing the word
remove
, followed by a comma and the parameters obtained in the previous step to/sys/module/ipmi_si/parameters/hotmod
. This action may take a considerable amount of time, during which theecho
command will appear to hang (in most cases it should finish after a delay of between 8 minutes and 30 minutes).Re-add the interface, in a similar fashion as removing it above, just replace the word
remove
withadd
. This action is almost instantaneous and asynchronous. In case you are scripting this and performing a check immediately after make sure to introduce a short sleep in between.In case of one single IPMI interface per system, the following one-line command can be used to recover local IPMI access:
Please note that this is only a temporary workaround and that the issue is likely to reappear in a matter of weeks or even days. Initially, after a fresh reboot, the
kipmiN
kernel helper thread is calledkipmi0
. For every successive application of the above recipe, the number at the end increases by one.echo 100 > /sys/module/ipmi_si/parameters/kipmid_max_busy_us