One of the compute nodes rebooted and we need a RCA for this.
Issue
-
One of the compute nodes rebooted and we need a RCA for this.
-
The following is seen in the logs:
Mar 30 05:28:42 overcloud-compute-13 systemd: Failed to ping hardware watchdog: Invalid argument
Mar 30 05:28:42 overcloud-compute-13 kernel: IPMI Watchdog: response: Error ff on cmd 22
Mar 30 05:28:47 overcloud-compute-13 os-collect-config: /var/lib/os-collect-config/local-data not found. Skipping
Mar 30 05:28:47 overcloud-compute-13 os-collect-config: No local metadata found (['/var/lib/os-collect-config/local-data'])
Mar 30 05:28:51 overcloud-compute-13 systemd-logind: Failed to start session scope session-405761.scope: Connection timed out
Mar 30 05:29:17 overcloud-compute-13 os-collect-config: /var/lib/os-collect-config/local-data not found. Skipping
Mar 30 05:29:17 overcloud-compute-13 os-collect-config: No local metadata found (['/var/lib/os-collect-config/local-data'])
Mar 30 05:29:26 overcloud-compute-13 systemd-logind: Failed to start user slice user-0.slice, ignoring: Connection timed out ((null))
Mar 30 05:28:42 overcloud-compute-13 kernel: IPMI Watchdog: response: Error ff on cmd 22
- Then it booted around 6 minutes later:
Mar 30 05:35:27 overcloud-compute-13 journal: Runtime journal is using 8.0M (max allowed 4.0G, trying to leave 4.0G free of 188.2G available → current limit 4.0G).
Mar 30 05:35:27 overcloud-compute-13 kernel: Initializing cgroup subsys cpuset
Mar 30 05:35:27 overcloud-compute-13 kernel: Initializing cgroup subsys cpu
Mar 30 05:35:27 overcloud-compute-13 kernel: Initializing cgroup subsys cpuacct
Environment
- Red Hat OpenStack Platform 11.0 (RHOSP)
- Red Hat Enterprise Linux 7 (RHEL)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.