HPE ProLiant Gen10, Gen9 and Gen8 Servers - Short Durations of Throttling (TCC Activation) May Cause Operating Systems to Issue Machine Check Alerts, Which Is Expected Behavior

Solution Verified - Updated -

Issue

  • The following Machine Check Exception (MCE) alerts observed in /var/log/messages file:

    Mar 30 13:01:01 host.example.com host kernel: CPU21: Package temperature above threshold, cpu clock throttled (total events = 7521)
    Mar 30 13:01:01 host.example.com host kernel: CPU28: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU25: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU28: Package temperature above threshold, cpu clock throttled (total events = 7527)
    Mar 30 13:01:01 host.example.com host kernel: CPU17: Core temperature above threshold, cpu clock throttled (total events = 1442)
    Mar 30 13:01:01 host.example.com host kernel: CPU16: Package temperature above threshold, cpu clock throttled (total events = 7514)
    Mar 30 13:01:01 host.example.com host kernel: CPU30: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU24: Package temperature above threshold, cpu clock throttled (total events = 7525)
    Mar 30 13:01:01 host.example.com host kernel: CPU23: Package temperature above threshold, cpu clock throttled (total events = 7495)
    Mar 30 13:01:01 host.example.com host kernel: CPU22: Package temperature above threshold, cpu clock throttled (total events = 7523)
    Mar 30 13:01:01 host.example.com host kernel: CPU19: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU27: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU26: Package temperature above threshold, cpu clock throttled (total events = 7525)
    Mar 30 13:01:01 host.example.com host kernel: CPU16: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU30: Package temperature above threshold, cpu clock throttled (total events = 7526)
    Mar 30 13:01:01 host.example.com host kernel: CPU18: Package temperature above threshold, cpu clock throttled (total events = 7519)
    Mar 30 13:01:01 host.example.com host kernel: CPU29: Package temperature above threshold, cpu clock throttled (total events = 7526)
    Mar 30 13:01:01 host.example.com host kernel: CPU19: Package temperature above threshold, cpu clock throttled (total events = 7504)
    Mar 30 13:01:01 host.example.com host kernel: CPU27: Package temperature above threshold, cpu clock throttled (total events = 7527)
    Mar 30 13:01:01 host.example.com host kernel: CPU22: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU23: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU31: Package temperature above threshold, cpu clock throttled (total events = 7526)
    Mar 30 13:01:01 host.example.com host kernel: CPU20: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU26: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU20: Package temperature above threshold, cpu clock throttled (total events = 7526)
    Mar 30 13:01:01 host.example.com host kernel: CPU21: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU24: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU29: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU31: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU18: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU17: Core temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU25: Package temperature above threshold, cpu clock throttled (total events = 7526)
    Mar 30 13:01:01 host.example.com host mcelog: Hardware event. This is not a software error.
    Mar 30 13:01:01 host.example.com host mcelog: MCE 0
    Mar 30 13:01:01 host.example.com host mcelog: CPU 17 THERMAL EVENT TSC 13bbcd79ada298
    Mar 30 13:01:01 host.example.com host mcelog: TIME 1522382461 Fri Mar 30 13:01:01 2018
    Mar 30 13:01:01 host.example.com host mcelog: Processor 17 heated above trip temperature. Throttling enabled.
    Mar 30 13:01:01 host.example.com host mcelog: Please check your system cooling. Performance will be impacted
    Mar 30 13:01:01 host.example.com host mcelog: STATUS 880003c3 MCGSTATUS 0
    Mar 30 13:01:01 host.example.com host mcelog: MCGCAP f000814 APICID 22 SOCKETID 1
    Mar 30 13:01:01 host.example.com host mcelog: CPUID Vendor Intel Family 6 Model 85
    Mar 30 13:01:01 host.example.com host mcelog: Hardware event. This is not a software error.
    Mar 30 13:01:01 host.example.com host mcelog: MCE 1
    Mar 30 13:01:01 host.example.com host mcelog: CPU 17 THERMAL EVENT TSC 13bbcd79adc4c8
    Mar 30 13:01:01 host.example.com host mcelog: TIME 1522382461 Fri Mar 30 13:01:01 2018
    Mar 30 13:01:01 host.example.com host mcelog: Processor 17 below trip temperature. Throttling disabled
    Mar 30 13:01:01 host.example.com host mcelog: STATUS 880a0282 MCGSTATUS 0
    Mar 30 13:01:01 host.example.com host mcelog: MCGCAP f000814 APICID 22 SOCKETID 1
    Mar 30 13:01:01 host.example.com host mcelog: CPUID Vendor Intel Family 6 Model 85
    

Environment

  • Red Hat Enterprise Linux (RHEL)
  • HPE ProLiant Gen10 Server
  • HPE ProLiant Gen9 Server
  • HPE ProLiant Gen8 Server

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In