Prometheus pod logs show SIGBUS error code=0x2 in RHOCP 4
Issue
- After upgrade from 4.5 to 4.6, Prometheus alerts are targeting the message "has disappeared from Prometheus target discovery".
-
Prometheus pods are in
CrashLoopBackOffwith aSIGBUS error code=0x2unexpected fault address 0x7f63860e2000 fatal error: fault [signal SIGBUS: bus error code=0x2 addr=0x7f63860e2000 pc=0x470e08] -
The Prometheus volume has run out of space. Note in some cases the below logs do not appear even when out of space condition is seen:
2021-04-12T15:38:26.955695412Z level=error ts=2021-04-12T15:38:26.955Z caller=scrape.go:1088 component="scrape manager" scrape_pool=openshift-multus/monitor-network/0 target=https://10.0.0.3:8443/metrics msg="Scrape commit failed" err="write to WAL: log samples: write /prometheus/wal/00000588: no space left on device" 2021-04-12T15:38:27.048922251Z level=error ts=2021-04-12T15:38:27.048Z caller=scrape.go:1088 component="scrape manager" scrape_pool=openshift-multus/monitor-network/0 target=https://10.0.0.2:8443/metrics msg="Scrape commit failed" err="write to WAL: log samples: write /prometheus/wal/00000588: no space left on device" 2021-04-12T15:38:27.284125345Z level=error ts=2021-04-12T15:38:27.284Z caller=scrape.go:1088 component="scrape manager" scrape_pool=openshift-multus/monitor-network/0 target=https://10.0.0.4:8443/metrics msg="Scrape commit failed" err="write to WAL: log samples: write /prometheus/wal/00000588: no space left on device"
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.