Overcloud deploy fails because ceph mons are out of time sync

Solution In Progress - Updated -

Issue

  • no ntp server set for deployment
  • ceph time sync problem
  • heat resource-list for the failed resource
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
| resource_name | physical_resource_id                 | resource_type                  | resource_status | updated_time         |
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
| 0             | 2dbb4474-1e19-4cc5-ba6e-de902e69e9e5 | OS::Heat::StructuredDeployment | CREATE_FAILED   | 2017-02-13T19:06:15Z |
| 1             | c242ff19-2511-4d32-af07-8e91b1472527 | OS::Heat::StructuredDeployment | CREATE_FAILED   | 2017-02-13T19:06:15Z |
| 2             | 3439ea94-f2c7-4e6e-8c2e-16e7ca6a4cd2 | OS::Heat::StructuredDeployment | CREATE_FAILED   | 2017-02-13T19:06:15Z |
+---------------+--------------------------------------+--------------------------------+-----------------+----------------------+
  • heat deployment-show snippet
/Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[images]/Exec[create-images]/returns: 2017-02-13 19:16:44.401423 7fba50478700  0 -- 11.120.0.10:0/1929916203 >> 11.120.0.11:6789/0 pipe(0x7fba3c016780 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fba3c00b0e0).fault\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[images]/Exec[create-images]/returns: 2017-02-13 19:16:53.000738 7fba595b1700  0 monclient(hunting): authenticate timed out after 300\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[images]/Exec[create-images]/returns: 2017-02-13 19:16:53.000781 7fba595b1700  0 librados: client.admin authentication error (110) Connection timed out\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[images]/Exec[create-images]/returns: Error connecting to cluster: TimedOut\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[backups]/Exec[create-backups]/returns: executed successfully\u001b[0m\n\u001b[mNotice:
...
without storeconfigs\u001b[0m\n\u001b[1;31mError: /Stage[main]/Tripleo::Profile::Base::Ceph::Mon/Ceph::Pool[backups]/Exec[set-backups-size]/unless: Check \"/bin/true # comment to satisfy puppet syntax requirements\\nset -ex\\ntest $(ceph osd pool get backups size | sed 's/.*: *//g') -eq 3\" exceeded timeout\u001b[0m\n", 
    "deploy_status_code": 0
root@-osd-compute-0 heat-admin]# ceph -s
    cluster 7416b11e-f01d-11e6-93fb-0025b56ec803
     health HEALTH_ERR
            clock skew detected on mon.-controller-2
            1344 pgs are stuck inactive for more than 300 seconds
            1344 pgs stuck inactive
            1 mons down, quorum 0,2 -controller-0,-controller-2
            Monitor clock skew detected 
     monmap e1: 3 mons at {-controller-0=11.120.0.10:6789/0,-controller-1=11.120.0.11:6789/0,-controller-2=11.120.0.12:6789/0}
            election epoch 2490, quorum 0,2 -controller-0,-controller-2
     osdmap e8: 3 osds: 0 up, 0 in
            flags sortbitwise
      pgmap v9: 1344 pgs, 6 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                1344 creating

Environment

Red Hat OpenStack 10

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content