Prometheus pod logs show SIGBUS error code=0x2 in RHOCP 4

Solution Verified - Updated -

Issue

  • After upgrade from 4.5 to 4.6, Prometheus alerts are targeting the message "has disappeared from Prometheus target discovery".
  • Prometheus pods are in CrashLoopBackOff with a SIGBUS error code=0x2

    unexpected fault address 0x7f63860e2000
    fatal error: fault
    [signal SIGBUS: bus error code=0x2 addr=0x7f63860e2000 pc=0x470e08]
    
  • The Prometheus volume has run out of space. Note in some cases the below logs do not appear even when out of space condition is seen:

    2021-04-12T15:38:26.955695412Z level=error ts=2021-04-12T15:38:26.955Z caller=scrape.go:1088 component="scrape manager" scrape_pool=openshift-multus/monitor-network/0 target=https://10.0.0.3:8443/metrics msg="Scrape commit failed" err="write to WAL: log samples: write /prometheus/wal/00000588: no space left on device"
    2021-04-12T15:38:27.048922251Z level=error ts=2021-04-12T15:38:27.048Z caller=scrape.go:1088 component="scrape manager" scrape_pool=openshift-multus/monitor-network/0 target=https://10.0.0.2:8443/metrics msg="Scrape commit failed" err="write to WAL: log samples: write /prometheus/wal/00000588: no space left on device"
    2021-04-12T15:38:27.284125345Z level=error ts=2021-04-12T15:38:27.284Z caller=scrape.go:1088 component="scrape manager" scrape_pool=openshift-multus/monitor-network/0 target=https://10.0.0.4:8443/metrics msg="Scrape commit failed" err="write to WAL: log samples: write /prometheus/wal/00000588: no space left on device"
    

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content