kipmi kernel helper thread kipmi0 is generating high CPU load

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 4
  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • Out-of-band management hardware that supports the Intelligent Platform Management Interface (IPMI, a systems health and management standard supported by multiple vendors)

Issue

  • The kipmi0 kernel helper thread sometimes goes to 100% CPU usage. Once there, it remains at 100% until the next reboot. After a reboot, things return to normal and then, at a random time later, it goes to 100% again.

Resolution

Proper resolution of this issue requires a fix at the hardware/firmware level.

  • Identify the BIOS version installed on the affected system through
# dmidecode
  • Identify the Baseboard Management Controller (BMC) firmware version through
# ipmitool mc info
  • Check the hardware vendor's support site for newer BIOS and BMC firmware versions and update to them, if possible.
  • If the issue persists with the current BIOS and BMC firmware versions, contact the hardware vendor's support organization for assistance.
Workaround for RHEL 6

Since the ipmi_si module was built in to the kernel in RHEL6, the following can be appended to the end of the kernel line in /etc/grub.conf:

ipmi_si.kipmid_max_busy_us=<time in microseconds>

The kipmid_max_busy_us option sets the maximum amount of time, in microseconds, that kipmid will spin before sleeping for a tick. This value sets a balance between performance and CPU waste and needs to be tuned to your needs.

Unfortunately there is no "catch-all" value that can be recommended here. Test and iteration is the best way to go about determining the best value for the environment.

Preferred Workaround for RHEL 4 and 5

As of RHEL 5.6, kernel-2.6.18-238, a new module parameter kipmid_max_busy_us is available which sets the maximum amount of time, in microseconds, that kipmid will spin before sleeping for a tick. This value sets a balance between performance and CPU waste and needs to be tuned to your needs.

Alternative Workaround for RHEL 4 and 5

The kipmiN helper threads can be disabled by supplying the option force_kipmid=0 to the ipmi_si module at the cost of a (possibly major) slowdown of IPMI operations:

Edit /etc/modprobe.conf and add the following entry:

options ipmi_si force_kipmid=0

Root Cause

  • kipmi0 is a kernel helper process/thread involved in handling IPMI interfaces. Within IPMI, there are several standard classes of interfaces. Some of these classes, like KCS (Keyboard Control Style) and SMIC (System Management Interface Chip) do not use interrupt requests (IRQs) to signal changes, and thus require polling to obtain command results. The kipmiN kernel helper threads perform this polling. Thus, it is normal for these threads to consume significant CPU time while an IPMI operation is in progress.
  • In this case, there is a problem in the interaction between the driver and the hardware/firmware which leads the driver to believe that an operation is still in progress, causing the high CPU load to continue until the system is rebooted.
  • As the kipmiN kernel helper threads are executed at low priority (so as not to hog the CPU), this should not cause problems under normal usage scenarios. As the Linux kernel implements IPMI support in terms of the interfaces classes rather than in terms of individual IPMI chipsets, this issue cannot be worked around by the kernel effectively and should be addressed by the hardware vendor instead.
  • Refer to Documentation/IPMI.txt in the kernel sources. This behavior has also been discussed on Dell's linux-poweredge mailing list.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

2 Comments

Posible workarounds

While the only permanent fix is a firmware / hardware fix, there are a number of workarounds that could be used.

  1. The first thing to try is a remote cold reset of the BMC. In some situations this can bring the kipmiN kernel helper thread to normal CPU usage levels, re-enabling local IPMI access.

  2. In case ipmi_si is built as a separate module unloading and reloading the module should fix the problem as well.

  3. The ipmi_si driver supports hot adding and removing of IPMI interfaces. This way, interfaces can be added or removed after the kernel is up and running, even when ipmi_si is built directly into the kernel. This is done using /sys/modules/ipmi_si/parameters/hotmod, which is a write-only parameter.

The workflow is the following:

Get the parameters used to create the current IPMI interface. In case there is only one IPMI interface per system (the most common case), you will find the parameters under /proc/ipmi/0/params. The parameters should look similar to this: kcs,i/o,0xca2,rsp=1,rsi=1,rsh=0,irq=0,ipmb=32.

Using the driver's hotmod interface trigger the removal of the current IPMI interface, by writing the word remove, followed by a comma and the parameters obtained in the previous step to /sys/module/ipmi_si/parameters/hotmod. This action may take a considerable amount of time, during which the echo command will appear to hang (in most cases it should finish after a delay of between 8 minutes and 30 minutes).

Re-add the interface, in a similar fashion as removing it above, just replace the word remove with add. This action is almost instantaneous and asynchronous. In case you are scripting this and performing a check immediately after make sure to introduce a short sleep in between.

In case of one single IPMI interface per system, the following one-line command can be used to recover local IPMI access:

IPMI_PARAMS=`cat /proc/ipmi/0/params`; time echo "remove,$IPMI_PARAMS" > /sys/module/ipmi_si/parameters/hotmod; sleep 5; echo "add,$IPMI_PARAMS" > /sys/module/ipmi_si/parameters/hotmod; sleep 5; ipmitool mc info

Please note that this is only a temporary workaround and that the issue is likely to reappear in a matter of weeks or even days. Initially, after a fresh reboot, the kipmiN kernel helper thread is called kipmi0. For every successive application of the above recipe, the number at the end increases by one.

echo 100 > /sys/module/ipmi_si/parameters/kipmid_max_busy_us