Nova Compute container fails with SSL: CERTIFICATE_VERIFY_FAILED when trying to access mysql on the Controllers VIP

Solution In Progress - Updated -

Issue

  • After upgrading an OSP 13 cluster to the latest minor release we started observing a failure in all the nova_compute containers running on the compute nodes. The full traceback will follow:
2020-04-16 16:52:01.829 157 ERROR oslo_service.service Traceback (most recent call last):
2020-04-16 16:52:01.829 157 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 731, in run_service
2020-04-16 16:52:01.829 157 ERROR oslo_service.service     service.start()
2020-04-16 16:52:01.829 157 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/nova/service.py", line 161, in start
2020-04-16 16:52:01.829 157 ERROR oslo_service.service     self.manager.init_host()
2020-04-16 16:52:01.829 157 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1194, in init_host
2020-04-16 16:52:01.829 157 ERROR oslo_service.service     context, self.host, expected_attrs=['info_cache', 'metadata'])
2020-04-16 16:52:01.829 157 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 177, in wrapper
2020-04-16 16:52:01.829 157 ERROR oslo_service.service     args, kwargs)
2020-04-16 16:52:01.829 157 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 240, in object_class_action_versions
2020-04-16 16:52:01.829 157 ERROR oslo_service.service     args=args, kwargs=kwargs)
2020-04-16 16:52:01.829 157 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 174, in call
2020-04-16 16:52:01.829 157 ERROR oslo_service.service     retry=self.retry)
2020-04-16 16:52:01.829 157 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 131, in _send
2020-04-16 16:52:01.829 157 ERROR oslo_service.service     timeout=timeout, retry=retry)
2020-04-16 16:52:01.829 157 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 625, in send
2020-04-16 16:52:01.829 157 ERROR oslo_service.service     retry=retry)
2020-04-16 16:52:01.829 157 ERROR oslo_service.service   File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 616, in _send
2020-04-16 16:52:01.829 157 ERROR oslo_service.service     raise result
2020-04-16 16:52:01.829 157 ERROR oslo_service.service RemoteError: Remote error: DBConnectionError (pymysql.err.OperationalError) (2003, u"Can't connect to MySQL server on 'overcloud.internalapi.localdomain.com' ([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618))") (Background on this error at: http://sqlalche.me/e/e3q8)
2020-04-16 16:52:01.829 157 ERROR oslo_service.service [u'Traceback (most recent call last):\n', u'  File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 126, in _object_dispatch\n    return getattr(target, method)(*args, **kwargs)\n', u'  File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper\n    result = fn(cls, context, *args, **kwargs)\n', u'  File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 1310, in get_by_host\n    use_slave=use_slave)\n', u'  File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 225, in wrapper\n    return f(*args, **kwargs)\n', u'  File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 1304, in _db_instance_get_all_by_host\n    columns_to_join=columns_to_join)\n', u'  File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 833, in instance_get_all_by_host\n    return IMPL.instance_get_all_by_host(context, host, columns_to_join)\n', u'  File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 270, in wrapped\n    return f(context, *args, **kwargs)\n', u'  File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 2610, in instance_get_all_by_host\n    query.filter_by(host=host).all(),\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2726, in all\n    return list(self)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2878, in __iter__\n    return self._execute_and_instances(context)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2899, in _execute_and_instances\n    close_with_result=True)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2908, in _get_bind_args\n    **kw\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2890, in _connection_from_session\n    conn = self.session.connection(**kw)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1035, in connection\n    execution_options=execution_options)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1040, in _connection_for_bind\n    engine, execution_options)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 409, in _connection_for_bind\n    conn = bind.contextual_connect()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2123, in contextual_connect\n    self._wrap_pool_connect(self.pool.connect, None),\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2162, in _wrap_pool_connect\n    e, dialect, self)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1472, in _handle_dbapi_exception_noconnection\n    util.raise_from_cause(newraise, exc_info)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause\n    reraise(type(exception), exception, tb=exc_tb, cause=cause)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2158, in _wrap_pool_connect\n    return fn()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 403, in connect\n    return _ConnectionFairy._checkout(self)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 782, in _checkout\n    fairy = _ConnectionRecord.checkout(pool)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 537, in checkout\n    rec.checkin()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__\n    compat.reraise(exc_type, exc_value, exc_tb)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 534, in checkout\n    dbapi_connection = rec.get_connection()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 623, in get_connection\n    self.__connect()\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/pool.py", line 667, in __connect\n    connection = pool._invoke_creator(self)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 105, in connect\n    return dialect.connect(*cargs, **cparams)\n', u'  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 410, in connect\n    return self.dbapi.connect(*cargs, **cparams)\n', u'  File "/usr/lib/python2.7/site-packages/pymysql/__init__.py", line 90, in Connect\n    return Connection(*args, **kwargs)\n', u'  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 706, in __init__\n    self.connect()\n', u'  File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 963, in connect\n    raise exc\n', u'DBConnectionError: (pymysql.err.OperationalError) (2003, u"Can\'t connect to MySQL server on \'overcloud.internalapi.localdomain.com\' ([SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618))") (Background on this error at: http://sqlalche.me/e/e3q8)\n'].
2020-04-16 16:52:01.829 157 ERROR oslo_service.service 
  • We did debug the issue pretty deeply and I'd love to provide the results of the debugging sessions to help you guys to have this case resolved in a timely manner.
  1. There seems to be no consistent way to change anything on the code for debugging purposes on the container itself, as an example modifying /usr/lib/python2.7/site-packages/pymysql/connections.py trying to dump ssl contexts results in a no-go. The code just remains the one provided by the image, even when a debugging container is launched and made persistent while the code changes are performed and the kolla_start binary executed

  2. Modifying /var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf and adding a typo on the overcloud.internalapi FQDN doesn't change CERTIFICATE_VERIFY_FAILED error which persists. And yes, the debugging container was removed and re-created for the bind mount to come up fresh

  3. Running tcpdump against net 172.17.8.0/24 (the internalapi network) or dst port 3306 shows no packages being sent out to the controllers at all. This makes me feel something's up with the target host this container is trying to connect to as there's no evidence the code is trying to connect to overcloud.internalapi FQDN. Also tested other ways to get an SSL stream like:

tcpdump -i any "tcp port 3306 and tcp[13] == 2"
tcpdump -i any "tcp port 3306 and (tcp[((tcp[12] & 0xf0) >> 2)] = 0x16)"
  1. Other nova containers on the same hosts are up:
a66db1561ae8        registry.access.redhat.com/rhosp13/openstack-nova-compute:latest                "dumb-init --singl..."   21 hours ago        Up 19 hours (healthy)                                   nova_migration_target
ed29d3ef8d6d        registry.access.redhat.com/rhosp13/openstack-nova-libvirt:latest                "kolla_start"            21 hours ago        Up 19 hours                                             nova_libvirt
cdcffc51c3b9        registry.access.redhat.com/rhosp13/openstack-nova-libvirt:latest                "kolla_start"            21 hours ago        Up 19 hours                                             nova_virtlogd
  1. We're using a TLS everywhere setup

  2. Debugging container was created via:

paunch debug --file nova_compute.json --container nova_compute --action print-cmd

then:

docker run --name nova_compute-os99dssi --detach=true --env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS --env=TRIPLEO_CONFIG_HASH=8e529ec296946b46aca006896004f058 --net=host --ipc=host --ulimit=nofile=131072 --ulimit=memlock=67108864 --health-cmd="/openstack/healthcheck 5672" --privileged=true --user=nova --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ipa/ca.crt:/etc/ipa/ca.crt:ro --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/log/containers/nova:/var/log/nova --volume=/var/lib/kolla/config_files/nova_compute.json:/var/lib/kolla/config_files/config.json:ro --volume=/var/lib/config-data/puppet-generated/nova_libvirt/:/var/lib/kolla/config_files/src:ro --volume=/etc/iscsi:/var/lib/kolla/config_files/src-iscsid:ro --volume=/etc/ceph:/var/lib/kolla/config_files/src-ceph:ro --volume=/dev:/dev --volume=/lib/modules:/lib/modules:ro --volume=/run:/run --volume=/var/lib/iscsi:/var/lib/iscsi --volume=/var/lib/nova:/var/lib/nova:shared --volume=/var/lib/libvirt:/var/lib/libvirt --volume=/sys/class/net:/sys/class/net --volume=/sys/bus/pci:/sys/bus/pci --volume=/boot:/boot:ro --cpuset-cpus=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71 --entrypoint="/usr/bin/sleep" registry.access.redhat.com/rhosp13/openstack-nova-compute:lates
  1. The CA certificate on the compute nodes is good to go and matches the one that works just fine on the controllers, on top of that running something like:
ssl_settings = {'ca':'/etc/ipa/ca.crt'}
import pymysql
conn = pymysql.connect(host='overcloud.internalapi.localdomain.com',
    port=3306,
    user='root',
    passwd='fooo',
    ssl=ssl_settings)

Within the container nova_compute itself, works beatifully.

  1. I had a similar issue back in the days where the issue was related to the ordering of intermediate certificates and the root CA on the target file. I tried making sure the order was respected as python SSL library expect with no luck.

  2. I attempted stracing the kolla_start / nova-compute binaries and there's no reference of connection to port 3306 / overcloud.internalapi.

Package versions:

openstack-sahara-image-pack-8.0.3-1.el7ost.noarch
openstack-manila-share-6.3.2-1.el7ost.noarch
openstack-selinux-0.8.18-1.el7ost.noarch
openstack-manila-ui-2.13.1-1.el7ost.noarch
openstack-mistral-engine-6.0.6-3.el7ost.noarch
python-openstackclient-lang-3.14.3-5.el7ost.noarch
openstack-ironic-api-10.1.10-1.el7ost.noarch
openstack-neutron-linuxbridge-12.1.1-6.el7ost.noarch
openstack-mistral-executor-6.0.6-3.el7ost.noarch
openstack-glance-16.0.1-10.el7ost.noarch
openstack-swift-proxy-2.17.1-3.el7ost.noarch
openstack-ceilometer-central-10.0.1-8.el7ost.noarch
openstack-ceilometer-notification-10.0.1-8.el7ost.noarch
openstack-neutron-ml2-12.1.1-6.el7ost.noarch
openstack-heat-common-10.0.3-9.el7ost.noarch
openstack-heat-engine-10.0.3-9.el7ost.noarch
openstack-ironic-inspector-7.2.4-1.el7ost.noarch
openstack-sahara-ui-8.0.2-1.el7ost.noarch
openstack-octavia-api-2.1.2-2.el7ost.noarch
openstack-swift-object-2.17.1-3.el7ost.noarch
openstack-aodh-common-6.0.1-4.el7ost.noarch
openstack-octavia-common-2.1.2-2.el7ost.noarch
puppet-openstacklib-12.4.0-5.el7ost.noarch
openstack-dashboard-13.0.3-1.el7ost.noarch
openstack-heat-agents-1.5.4-1.el7ost.noarch
openstack-neutron-l2gw-agent-12.0.2-0.20190420004620.270972f.el7ost.noarch
openstack-ceilometer-ipmi-10.0.1-8.el7ost.noarch
openstack-aodh-api-6.0.1-4.el7ost.noarch
puppet-openstack_extras-12.4.1-0.20180831193234.7fed86a.el7ost.noarch
openstack-cinder-12.0.10-2.el7ost.noarch
openstack-neutron-sriov-nic-agent-12.1.1-6.el7ost.noarch
openstack-mistral-common-6.0.6-3.el7ost.noarch
openstack-sahara-common-8.0.3-1.el7ost.noarch
python2-openstacksdk-0.11.4-1.el7ost.noarch
openstack-sahara-engine-8.0.3-1.el7ost.noarch
openstack-swift-container-2.17.1-3.el7ost.noarch
openstack-panko-api-4.0.2-2.el7ost.noarch
openstack-aodh-listener-6.0.1-4.el7ost.noarch
openstack-swift-plugin-swift3-1.12.1-0.20180601045836.90db5d1.el7ost.noarch
openstack-zaqar-6.0.1-4.el7ost.noarch
openstack-heat-api-10.0.3-9.el7ost.noarch
openstack-ironic-conductor-10.1.10-1.el7ost.noarch
openstack-octavia-health-manager-2.1.2-2.el7ost.noarch
openstack-neutron-openvswitch-12.1.1-6.el7ost.noarch
openstack-manila-6.3.2-1.el7ost.noarch
openstack-panko-common-4.0.2-2.el7ost.noarch
openstack-sahara-api-8.0.3-1.el7ost.noarch
openstack-aodh-evaluator-6.0.1-4.el7ost.noarch
openstack-ec2-api-6.0.1-0.20190420032753.256dce9.el7ost.noarch
openstack-sahara-8.0.3-1.el7ost.noarch
openstack-octavia-housekeeping-2.1.2-2.el7ost.noarch
openstack-ceilometer-common-10.0.1-8.el7ost.noarch
openstack-neutron-common-12.1.1-6.el7ost.noarch
python2-openstackclient-3.14.3-5.el7ost.noarch
openstack-swift-account-2.17.1-3.el7ost.noarch
openstack-neutron-lbaas-ui-4.0.1-0.20190723082436.ccf8621.el7ost.noarch
openstack-mistral-api-6.0.6-3.el7ost.noarch
openstack-neutron-lbaas-12.0.1-0.20190803015156.b86fcef.el7ost.noarch
openstack-heat-api-cfn-10.0.3-9.el7ost.noarch
openstack-octavia-worker-2.1.2-2.el7ost.noarch
openstack-ceilometer-compute-10.0.1-8.el7ost.noarch
openstack-barbican-api-6.0.1-4.el7ost.noarch
openstack-keystone-13.0.4-1.el7ost.noarch
openstack-neutron-12.1.1-6.el7ost.noarch
openstack-mistral-event-engine-6.0.6-3.el7ost.noarch
openstack-ironic-common-10.1.10-1.el7ost.noarch
openstack-dashboard-theme-13.0.0-1.el7ost.noarch
openstack-neutron-metering-agent-12.1.1-6.el7ost.noarch
openstack-ceilometer-polling-10.0.1-8.el7ost.noarch
openstack-aodh-notifier-6.0.1-4.el7ost.noarch
openstack-barbican-common-6.0.1-4.el7ost.noarch

Containers versions:

registry.access.redhat.com/rhosp13/openstack-nova-compute                latest              5a9b29bac238        2 weeks ago         1.74 GB
registry.access.redhat.com/rhosp13/openstack-neutron-server              latest              67e7002ae5bb        2 weeks ago         874 MB
registry.access.redhat.com/rhosp13/openstack-ceilometer-central          latest              dbfae4b9454c        2 weeks ago         723 MB
registry.access.redhat.com/rhosp13/openstack-sensu-client                latest              0a4728186ad9        2 weeks ago         678 MB
registry.access.redhat.com/rhosp13/openstack-iscsid                      latest              b110b0d3a21c        2 weeks ago         509 MB
registry.access.redhat.com/rhosp13/openstack-cron                        latest              58d7365c265e        2 weeks ago         504 MB
registry.access.redhat.com/rhosp13/openstack-fluentd                     latest              5d465fdacb4c        2 weeks ago         509 MB

Environment

  • Red Hat OpenStack Platform 13.0 (RHOSP)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content