Red Hat OpenStack Platform 13 deployment with Rados gateway (RGW) fails on stack update on workflow step 2 with the following in mistral logs ["docker", "inspect", "ca02850116b0"]

Solution In Progress - Updated -

Issue

Disclaimer: Links contained herein to external website(s) are provided for convenience only. Red Hat has not reviewed the links and is not responsible for the content or its availability. The inclusion of any link to an external website does not imply endorsement by Red Hat of the website or their entities, products or services. You agree that Red Hat is not responsible or liable for any loss or expenses that may result due to your use of (or reliance on) the external site or content.

Red Hat OpenStack Platform 13 deployment with Rados gateway (RGW) fails on stack update on workflow step 2 with the following in mistral logs ["docker", "inspect", "ca02850116b0"]

This is a new deployment, the initial deployment goes through. On an update just minutes later, ceph-ansible fails. The following appears in the mistral logs on the undercloud:

2077 TASK [ceph-docker-common : inspect ceph rgw container] *************************
2078 Wednesday 21 November 2018  15:46:25 +0100 (0:00:00.042)       0:01:49.177 ****
2079 fatal: [192.168.100.16]: FAILED! => {"changed": false, "cmd": ["docker", "inspect", "ca02850116b0"], "delta": "0:00:00.051248", "end": "2018-11-21 14:45:48.515515", "failed": true, "msg": "non-zero return code", "rc":      1, "start": "2018-11-21 14:45:48.464267", "stderr": "Error: No such object: ca02850116b0", "stderr_lines": ["Error: No such object: ca02850116b0"], "stdout": "[]", "stdout_lines": ["[]"]}
2080 
2081 PLAY RECAP *********************************************************************
2082 192.168.100.10             : ok=2    changed=0    unreachable=0    failed=0
2083 192.168.100.11             : ok=2    changed=0    unreachable=0    failed=0
2084 192.168.100.12             : ok=2    changed=0    unreachable=0    failed=0
2085 192.168.100.14             : ok=58   changed=4    unreachable=0    failed=0
2086 192.168.100.15             : ok=2    changed=0    unreachable=0    failed=0
2087 192.168.100.16             : ok=29   changed=0    unreachable=0    failed=1
2088 192.168.100.23             : ok=2    changed=0    unreachable=0    failed=0
2089 192.168.100.28             : ok=2    changed=0    unreachable=0    failed=0
2090 192.168.100.7              : ok=56   changed=4    unreachable=0    failed=0
2091 192.168.100.9              : ok=2    changed=0    unreachable=0    failed=0

2094 INSTALLER STATUS ***************************************************************
2095 Install Ceph Monitor        : In Progress (0:01:12)
2096 \tThis phase can be restarted by running: roles/ceph-mon/tasks/main.yml

The journal of the controller with IP 192.168.100.16 and failed=1 shows:

Nov 21 14:45:38 controller-1 dockerd-current[38040]: 2018-11-21 14:45:38.545536 7fa6d2a97e80 -1 Couldn't init storage provider (RADOS)
Nov 21 14:45:38 controller-1 docker[771843]: 2018-11-21 14:45:38.545536 7fa6d2a97e80 -1 Couldn't init storage provider (RADOS)

This is a lab environment with 3 Ceph OSDs and a replica count of 3:

ceph status
  cluster:
    id:     7c2cc968-e7ad-11e8-b5a8-5cf9dd285a01
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum controller-0,controller-1,controller-2
    mgr: controller-2(active), standbys: controller-0, controller-1
    osd: 3 osds: 3 up, 3 in

  data:
    pools:   7 pools, 196 pgs
    objects: 2187 objects, 12612 MB
    usage:   38228 MB used, 2197 GB / 2234 GB avail
    pgs:     196 active+clean

  io:
    client:   4722 B/s rd, 6 op/s rd, 0 op/s wr
cat /etc/ceph/ceph.conf
[client.rgw.controller-0]
host = controller-0
keyring = /var/lib/ceph/radosgw/ceph-rgw.controller-0/keyring
log file = /var/log/ceph/ceph-rgw-controller-0.log
rgw frontends = civetweb port=10.20.70.58:8080 num_threads=100

[client.rgw.controller-1]
host = controller-1
keyring = /var/lib/ceph/radosgw/ceph-rgw.controller-1/keyring
log file = /var/log/ceph/ceph-rgw-controller-1.log
rgw frontends = civetweb port=10.20.70.68:8080 num_threads=100

[client.rgw.controller-2]
host = controller-2
keyring = /var/lib/ceph/radosgw/ceph-rgw.controller-2/keyring
log file = /var/log/ceph/ceph-rgw-controller-2.log
rgw frontends = civetweb port=10.20.70.71:8080 num_threads=100

# Please do not change this file directly since it is managed by Ansible and will be overwritten
[global]
# let's force the admin socket the way it was so we can properly check for existing instances
# also the line $cluster-$name.$pid.$cctid.asok is only needed when running multiple instances
# of the same daemon, thing ceph-ansible cannot do at the time of writing
admin socket = "$run_dir/$cluster-$name.asok"
cluster network = 10.20.100.0/24
fsid = 7c2cc968-e7ad-11e8-b5a8-5cf9dd285a01
journal_size = 10240
log file = /dev/null
max_open_files = 131072
mon cluster log file = /dev/null
mon host = 10.20.70.71,10.20.70.58,10.20.70.68
mon initial members = controller-2,controller-0,controller-1
osd_pool_default_pg_num = 28
osd_pool_default_pgp_num = 28
osd_pool_default_size = 3
public network = 10.20.70.0/24
rgw_keystone_accepted_roles = Member, admin
rgw_keystone_admin_domain = default
rgw_keystone_admin_password = <password>
rgw_keystone_admin_project = service
rgw_keystone_admin_user = swift
rgw_keystone_api_version = 3
rgw_keystone_implicit_tenants = true
rgw_keystone_revocation_interval = 0
rgw_keystone_url = http://10.20.60.50:5000
rgw_s3_auth_use_keystone = true

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content