Overcloud deploy fails because ceph mons are out of time sync
Issue
- no ntp server set for deployment
- ceph time sync problem
- heat resource-list for the failed resource
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
| resource_name | physical_resource_id | resource_type | resource_status | updated_time |
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
| 0 | 2dbb4474-1e19-4cc5-ba6e-de902e69e9e5 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2017-02-13T19:06:15Z |
| 1 | c242ff19-2511-4d32-af07-8e91b1472527 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2017-02-13T19:06:15Z |
| 2 | 3439ea94-f2c7-4e6e-8c2e-16e7ca6a4cd2 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2017-02-13T19:06:15Z |
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
- heat deployment-show snippet
/Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[images]/Exec[create-images]/returns: 2017-02-13 19:16:44.401423 7fba50478700 0 -- 11.120.0.10:0/1929916203 >> 11.120.0.11:6789/0 pipe(0x7fba3c016780 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fba3c00b0e0).fault\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[images]/Exec[create-images]/returns: 2017-02-13 19:16:53.000738 7fba595b1700 0 monclient(hunting): authenticate timed out after 300\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[images]/Exec[create-images]/returns: 2017-02-13 19:16:53.000781 7fba595b1700 0 librados: client.admin authentication error (110) Connection timed out\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[images]/Exec[create-images]/returns: Error connecting to cluster: TimedOut\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[backups]/Exec[create-backups]/returns: executed successfully\u001b[0m\n\u001b[mNotice:
...
without storeconfigs\u001b[0m\n\u001b[1;31mError: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[backups]/Exec[set-backups-size]/unless: Check \"/bin/true # comment to satisfy puppet syntax requirements\\nset -ex\\ntest $(ceph osd pool get backups size | sed 's/.*: *//g') -eq 3\" exceeded timeout\u001b[0m\n",
"deploy_status_code": 0
root@-osd-compute-0 heat-admin]# ceph -s
cluster 7416b11e-f01d-11e6-93fb-0025b56ec803
health HEALTH_ERR
clock skew detected on mon.-controller-2
1344 pgs are stuck inactive for more than 300 seconds
1344 pgs stuck inactive
1 mons down, quorum 0,2 -controller-0,-controller-2
Monitor clock skew detected
monmap e1: 3 mons at {-controller-0=11.120.0.10:6789/0,-controller-1=11.120.0.11:6789/0,-controller-2=11.120.0.12:6789/0}
election epoch 2490, quorum 0,2 -controller-0,-controller-2
osdmap e8: 3 osds: 0 up, 0 in
flags sortbitwise
pgmap v9: 1344 pgs, 6 pools, 0 bytes data, 0 objects
0 kB used, 0 kB / 0 kB avail
1344 creating
Environment
Red Hat OpenStack 10
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
