Issue affecting minor updates of Red Hat Ceph Storage 3 can cause OSDs corruption
Environment
- Red Hat Ceph Storage (RHCS) 3
- Red Hat Enterprise Linux (RHEL) 7
- Red Hat OpenStack Platform (RHOSP) 13
Issue
There is a known issue which can affect minor updates of Red Had Ceph Storage 3 which can cause OSD corruption when OSDs are deployed in containers using ceph-ansible or when OSDs are deployed using Red Hat OpenStack 13 director.
- A missing dependency in the Ceph OSD
systemd
units file causes abrubt termination of the containers ondocker
package updates and service restarts. - A service disruption and potential data corruption on uncontrolled updates of the
docker
package on Ceph OSD nodes.
Resolution
Perform the following steps for your director-driven or standalone deployment.
RHOSP 13 director-driven RHCS 3 deployments
Run the openstack overcloud ceph-upgrade
command to update the containerized RHCS 3 cluster before running the RHOSP 13 overcloud update.
- Complete the undercloud update.
- Make sure that the ceph-ansible package version on the undercloud is >= v3.2.52:
$ rpm -q ceph-ansible
- Complete all steps from Keeping Red Hat OpenStack Platform Updated up to 4.4. Updating all Controller nodes
- Run the Ceph Storage update command. For example:
$ openstack overcloud ceph-upgrade run \
--templates \
-e <ENVIRONMENT FILE> \
-e /home/stack/templates/overcloud_images.yaml \
-e /home/stack/templates/updates-environment.yaml
- After the execution, ensure that
Requires=docker.service
appears in thesystemd
units of the Ceph OSD containers, for example :
(undercloud) $ ssh heat-admin@overcloud-ceph-0
(overcloud-ceph-0) $ grep Requires /etc/systemd/system/ceph-osd\@.service
Requires=docker.service
- Continue with the overcloud update from step 4.4. Updating all Controller nodes.
Standalone RHCS 3 deployments
- Update the
ceph-ansible
package on your deployment node:
# yum update ceph-ansible
- Refresh the
systemd
units on the cluster nodes by re-running thesite-docker.yaml
playbook. Please ensure that anygroup_var
andinventory
file created for the initial deployment is still available :
# ansible-playbook site-docker.yml
- After the playbook execution, ensure that
Requires=docker.service
appears in thesystemd
units of the Ceph OSD containers on the storage nodes:
$ grep Requires /etc/systemd/system/ceph-osd\@.service
Requires=docker.service
Results
After you have executed the steps above and you have verified that all Ceph OSD systemd
units have been updated to include the Requires=docker.service
line, you can initiate the standard update process for Red Hat Ceph Storage.
Root Cause
RHCS 3 relies on docker
for containerized deployments running on RHEL 7. The ceph-ansible
fix for BZ1846830 updates the systemd
units controlling Ceph containers making the systemd
units require the docker
service to be up and running for execution. This requirement is essential to implement a safe update path and avoid service disruption and potential data corruption on uncontrolled updates of the docker
package.
A missing dependency in the Ceph OSD systemd
units file causes abrubt termination of the containers on docker
package updates and service restarts.
Updating the ceph-ansible
package is not sufficient for the fix to be effective. It is necessary to update the containers' systemd
units by rerunning the deployment playbook.
Diagnostic Steps
To verify if the Ceph cluster can be updated safely, ensure that the line Requires=docker.service
appears in the systemd
units of Ceph OSD containers. For example, for director-driven deployments, log into all nodes hosting a Ceph OSD container and inspect the systemd unit file:
(undercloud) $ ssh heat-admin@overcloud-ceph-0
(overcloud-ceph-0) $ grep Requires /etc/systemd/system/ceph-osd\@.service
(overcloud-ceph-0) $
If the output does not show the Requires=docker.service
line, like in the example above, it is essential to update the systemd unit file following the instructions in the Resolution section.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments