Overcloud deploy fails because ceph mons are out of time sync
Issue
- no ntp server set for deployment
- ceph time sync problem
- heat resource-list for the failed resource
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
| resource_name | physical_resource_id | resource_type | resource_status | updated_time |
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
| 0 | 2dbb4474-1e19-4cc5-ba6e-de902e69e9e5 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2017-02-13T19:06:15Z |
| 1 | c242ff19-2511-4d32-af07-8e91b1472527 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2017-02-13T19:06:15Z |
| 2 | 3439ea94-f2c7-4e6e-8c2e-16e7ca6a4cd2 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2017-02-13T19:06:15Z |
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
- heat deployment-show snippet
/Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[images]/Exec[create-images]/returns: 2017-02-13 19:16:44.401423 7fba50478700 0 -- 11.120.0.10:0/1929916203 >> 11.120.0.11:6789/0 pipe(0x7fba3c016780 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fba3c00b0e0).fault\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[images]/Exec[create-images]/returns: 2017-02-13 19:16:53.000738 7fba595b1700 0 monclient(hunting): authenticate timed out after 300\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[images]/Exec[create-images]/returns: 2017-02-13 19:16:53.000781 7fba595b1700 0 librados: client.admin authentication error (110) Connection timed out\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[images]/Exec[create-images]/returns: Error connecting to cluster: TimedOut\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[backups]/Exec[create-backups]/returns: executed successfully\u001b[0m\n\u001b[mNotice:
...
without storeconfigs\u001b[0m\n\u001b[1;31mError: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[backups]/Exec[set-backups-size]/unless: Check \"/bin/true # comment to satisfy puppet syntax requirements\\nset -ex\\ntest $(ceph osd pool get backups size | sed 's/.*: *//g') -eq 3\" exceeded timeout\u001b[0m\n",
"deploy_status_code": 0
root@-osd-compute-0 heat-admin]# ceph -s
cluster 7416b11e-f01d-11e6-93fb-0025b56ec803
health HEALTH_ERR
clock skew detected on mon.-controller-2
1344 pgs are stuck inactive for more than 300 seconds
1344 pgs stuck inactive
1 mons down, quorum 0,2 -controller-0,-controller-2
Monitor clock skew detected
monmap e1: 3 mons at {-controller-0=11.120.0.10:6789/0,-controller-1=11.120.0.11:6789/0,-controller-2=11.120.0.12:6789/0}
election epoch 2490, quorum 0,2 -controller-0,-controller-2
osdmap e8: 3 osds: 0 up, 0 in
flags sortbitwise
pgmap v9: 1344 pgs, 6 pools, 0 bytes data, 0 objects
0 kB used, 0 kB / 0 kB avail
1344 creating
Environment
Red Hat OpenStack 10
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.