HPE ProLiant Gen10, Gen9 and Gen8 Servers - Short Durations of Throttling (TCC Activation) May Cause Operating Systems to Issue Machine Check Alerts, Which Is Expected Behavior

Solution Verified - Updated -

Issue

  • The following Machine Check Exception (MCE) alerts observed in /var/log/messages file:

    Mar 30 13:01:01 host.example.com host kernel: CPU21: Package temperature above threshold, cpu clock throttled (total events = 7521)
    Mar 30 13:01:01 host.example.com host kernel: CPU28: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU25: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU28: Package temperature above threshold, cpu clock throttled (total events = 7527)
    Mar 30 13:01:01 host.example.com host kernel: CPU17: Core temperature above threshold, cpu clock throttled (total events = 1442)
    Mar 30 13:01:01 host.example.com host kernel: CPU16: Package temperature above threshold, cpu clock throttled (total events = 7514)
    Mar 30 13:01:01 host.example.com host kernel: CPU30: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU24: Package temperature above threshold, cpu clock throttled (total events = 7525)
    Mar 30 13:01:01 host.example.com host kernel: CPU23: Package temperature above threshold, cpu clock throttled (total events = 7495)
    Mar 30 13:01:01 host.example.com host kernel: CPU22: Package temperature above threshold, cpu clock throttled (total events = 7523)
    Mar 30 13:01:01 host.example.com host kernel: CPU19: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU27: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU26: Package temperature above threshold, cpu clock throttled (total events = 7525)
    Mar 30 13:01:01 host.example.com host kernel: CPU16: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU30: Package temperature above threshold, cpu clock throttled (total events = 7526)
    Mar 30 13:01:01 host.example.com host kernel: CPU18: Package temperature above threshold, cpu clock throttled (total events = 7519)
    Mar 30 13:01:01 host.example.com host kernel: CPU29: Package temperature above threshold, cpu clock throttled (total events = 7526)
    Mar 30 13:01:01 host.example.com host kernel: CPU19: Package temperature above threshold, cpu clock throttled (total events = 7504)
    Mar 30 13:01:01 host.example.com host kernel: CPU27: Package temperature above threshold, cpu clock throttled (total events = 7527)
    Mar 30 13:01:01 host.example.com host kernel: CPU22: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU23: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU31: Package temperature above threshold, cpu clock throttled (total events = 7526)
    Mar 30 13:01:01 host.example.com host kernel: CPU20: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU26: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU20: Package temperature above threshold, cpu clock throttled (total events = 7526)
    Mar 30 13:01:01 host.example.com host kernel: CPU21: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU24: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU29: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU31: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU18: Package temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU17: Core temperature/speed normal
    Mar 30 13:01:01 host.example.com host kernel: CPU25: Package temperature above threshold, cpu clock throttled (total events = 7526)
    Mar 30 13:01:01 host.example.com host mcelog: Hardware event. This is not a software error.
    Mar 30 13:01:01 host.example.com host mcelog: MCE 0
    Mar 30 13:01:01 host.example.com host mcelog: CPU 17 THERMAL EVENT TSC 13bbcd79ada298
    Mar 30 13:01:01 host.example.com host mcelog: TIME 1522382461 Fri Mar 30 13:01:01 2018
    Mar 30 13:01:01 host.example.com host mcelog: Processor 17 heated above trip temperature. Throttling enabled.
    Mar 30 13:01:01 host.example.com host mcelog: Please check your system cooling. Performance will be impacted
    Mar 30 13:01:01 host.example.com host mcelog: STATUS 880003c3 MCGSTATUS 0
    Mar 30 13:01:01 host.example.com host mcelog: MCGCAP f000814 APICID 22 SOCKETID 1
    Mar 30 13:01:01 host.example.com host mcelog: CPUID Vendor Intel Family 6 Model 85
    Mar 30 13:01:01 host.example.com host mcelog: Hardware event. This is not a software error.
    Mar 30 13:01:01 host.example.com host mcelog: MCE 1
    Mar 30 13:01:01 host.example.com host mcelog: CPU 17 THERMAL EVENT TSC 13bbcd79adc4c8
    Mar 30 13:01:01 host.example.com host mcelog: TIME 1522382461 Fri Mar 30 13:01:01 2018
    Mar 30 13:01:01 host.example.com host mcelog: Processor 17 below trip temperature. Throttling disabled
    Mar 30 13:01:01 host.example.com host mcelog: STATUS 880a0282 MCGSTATUS 0
    Mar 30 13:01:01 host.example.com host mcelog: MCGCAP f000814 APICID 22 SOCKETID 1
    Mar 30 13:01:01 host.example.com host mcelog: CPUID Vendor Intel Family 6 Model 85
    

Environment

  • Red Hat Enterprise Linux (RHEL)
  • HPE ProLiant Gen10 Server
  • HPE ProLiant Gen9 Server
  • HPE ProLiant Gen8 Server

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content