Issue affecting minor updates of Red Had Ceph Storage 3 can cause OSDs corruption
Environment
- Red Hat Ceph Storage 3
- Red Hat Enterprise Linux 7
- Red Hat OpenStack Platform 13
Issue
A known issue affecting minor updates of Red Had Ceph Storage 3 can cause OSD corruption when OSDs are deployed in containers using ceph-ansible OR OSDs are deployed by using Red Hat OpenStack 13 director:
- A missing dependency in the Ceph OSD
systemd
units file causes abrubt termination of the containers ondocker
package updates and service restarts. - A service disruption or even data corruption on uncontrolled updates of the
docker
package on Ceph OSD nodes.
Resolution
Perform the following steps for your director-driven or standalone deployment.
OpenStack director-driven RHCS3 deployments
Before starting with the undercloud minor update, update ceph-ansible
to be newer than 3.2.44 :
# yum update ceph-ansible
Run a heat stack update identical to the last execution of the overcloud deploy
command, with the same arguments and any heat environment file :
# openstack overcloud deploy --templates \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /home/stack/templates/ceph-custom-config.yaml \
-e …
After the execution, ensure that Requires=docker.service
appears in the systemd
units of the Ceph OSD containers, for example :
(undercloud) $ ssh heat-admin@overcloud-ceph-0
(overcloud-ceph-0) $ grep Requires /etc/systemd/system/ceph-osd\@.service
Requires=docker.service
Standalone RHCS3 deployments
Update ceph-ansible
package :
# yum update ceph-ansible
Refresh the systemd
units on the cluster nodes by re-running the site-docker.yaml
playbook. Assuming any group_var and inventory file created for the initial deployment to be available :
# ansible-playbook site-docker.yml
After the execution, ensure that Requires=docker.service
appears in the systemd
units of the Ceph OSD containers on the target nodes.
Results
After these steps, and when all Ceph OSDs systemd
units have been updated to include the Requires=docker.service
line, you can initiate the standard update process for Red Hat Ceph Storage or Red Hat OpenStack.
Root Cause
Red Hat Ceph Storage 3 relies on docker
for containerized deployments running on RHEL 7. The ceph-ansible
fix for BZ1846830 updates the systemd
units controlling Ceph containers making the systemd
units require the docker
service to be up and running for execution. This requirement is essential to implement a safe update path and avoid service disruption or even data corruption on uncontrolled updates of the docker
package.
A missing dependency in the Ceph OSD systemd
units file causes abrubt termination of the containers on docker
package updates and service restarts.
Updating the ceph-ansible
package is not sufficient for the fix to be effective. It is necessary to update the containers' systemd
units by rerunning the deployment playbook.
Diagnostic Steps
To verify if the Ceph cluster can be updated safely, ensure that the line Requires=docker.service
appears in the systemd
units of Ceph OSD containers. For example, for director-driven deployments, log on all nodes hosting a Ceph OSD container and check if the desired "Requires"
line is found:
(undercloud) $ ssh heat-admin@overcloud-ceph-0
(overcloud-ceph-0) $ grep Requires /etc/systemd/system/ceph-osd\@.service
Requires=docker.service
Should the output not be showing the desired dependency, it is essential to update the units following the instructions Resolution section.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.