One of the compute nodes rebooted and we need a RCA for this.

Solution In Progress - Updated -

Issue

  • One of the compute nodes rebooted and we need a RCA for this.

  • The following is seen in the logs:

Mar 30 05:28:42 overcloud-compute-13 systemd: Failed to ping hardware watchdog: Invalid argument
Mar 30 05:28:42 overcloud-compute-13 kernel: IPMI Watchdog: response: Error ff on cmd 22
Mar 30 05:28:47 overcloud-compute-13 os-collect-config: /var/lib/os-collect-config/local-data not found. Skipping
Mar 30 05:28:47 overcloud-compute-13 os-collect-config: No local metadata found (['/var/lib/os-collect-config/local-data'])
Mar 30 05:28:51 overcloud-compute-13 systemd-logind: Failed to start session scope session-405761.scope: Connection timed out
Mar 30 05:29:17 overcloud-compute-13 os-collect-config: /var/lib/os-collect-config/local-data not found. Skipping
Mar 30 05:29:17 overcloud-compute-13 os-collect-config: No local metadata found (['/var/lib/os-collect-config/local-data'])
Mar 30 05:29:26 overcloud-compute-13 systemd-logind: Failed to start user slice user-0.slice, ignoring: Connection timed out ((null))
Mar 30 05:28:42 overcloud-compute-13 kernel: IPMI Watchdog: response: Error ff on cmd 22
  • Then it booted around 6 minutes later:
Mar 30 05:35:27 overcloud-compute-13 journal: Runtime journal is using 8.0M (max allowed 4.0G, trying to leave 4.0G free of 188.2G available → current limit 4.0G).
Mar 30 05:35:27 overcloud-compute-13 kernel: Initializing cgroup subsys cpuset
Mar 30 05:35:27 overcloud-compute-13 kernel: Initializing cgroup subsys cpu
Mar 30 05:35:27 overcloud-compute-13 kernel: Initializing cgroup subsys cpuacct

Environment

  • Red Hat OpenStack Platform 11.0 (RHOSP)
  • Red Hat Enterprise Linux 7 (RHEL)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In