EDAC kernel panic with 'Tmid Thermal event with intelligent throttling disabled' message

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux
  • edac_mc kernel module

Issue

  • The system panics with following panic string:
Kernel panic - not syncing: UE row 3, channel-a= 2 channel-b= 3 labels "-": (Branch=1 DRAM-Bank=1 RDWR=Read RAS=356 CAS=0 FATAL Err=0x4 (>Tmid Thermal event with intelligent throttling disabled))

Resolution

  • Contact the hardware vendor to establish if the thermal event seen is true/correct or a false positive

Root Cause

  • The ">Tmid Thermal event with intelligent throttling disabled" message implies :
    • Intelligent throttling is disabled and the thermal sensor transitions from “below Tmid” to “above Tmid”
    • In other words, the EDAC driver is reporting the temperature has passed the "middle" threshold
    • The BIOS flags accordingly when he temperature is passed the middle threshold
    • The Tmid thermal event is considered critical enough for the EDAC driver to intentionally panic the system:
  • According to the source code, the panic() function is called since panic_on_ue == '1' :
void edac_mc_handle_fbd_ue(struct mem_ctl_info *mci,
            unsigned int csrow,
            unsigned int channela,
            unsigned int channelb, char *msg)
{
...
    if (panic_on_ue)
        panic("UE row %d, channel-a= %d channel-b= %d "
            "labels \"%s\": %s\n", csrow, channela,
            channelb, labels, msg);
}

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.