ceph-ansible site.yml failed at TASK inspect ceph osd container with "Error: No such object"

Solution Verified - Updated -

Environment

  • Red Hat Ceph Storage 4

Issue

  • Run ceph-ansible site.yml to change ceph configuration and it failed at TASK [ceph-container-common : inspect ceph osd container] with "Error: No such object: 239c8895255c":
[admin@admin ceph-ansible]$ ansible-playbook site.yml
2020-09-14 20:55:18,975 p=1957 u=root |  TASK [ceph-container-common : inspect ceph osd container] *********************************************************
**************************************************************************************
2020-09-14 20:55:18,975 p=1957 u=root |  Monday 14 September 2020  20:55:18 +0800 (0:00:00.861)       0:02:52.847 ****** 
2020-09-14 20:55:19,891 p=1957 u=root |  ok: [osd-129]
2020-09-14 20:55:20,041 p=1957 u=root |  ok: [osd-130]
2020-09-14 20:55:20,254 p=1957 u=root |  ok: [osd-131]
2020-09-14 20:55:20,418 p=1957 u=root |  ok: [osd-132]
2020-09-14 20:55:20,519 p=1957 u=root |  ok: [osd-133]
2020-09-14 20:55:20,590 p=1957 u=root |  ok: [osd-134]
2020-09-14 20:55:20,661 p=1957 u=root |  ok: [osd-135]
2020-09-14 20:55:20,700 p=1957 u=root |  fatal: [osd-136]: FAILED! => changed=false
cmd:
  - docker
  - inspect
  - 239c8895255c
  - da7764905edb
  - de317b04efad
  - de927ac930e7
  - 66ce133bd67e
  - 99d9b319031f
  - d9aad6dec8c7
  - 9f9dfd2aee3c
  - 16b6d1b829c8
  - e2d978b49625
  - 04e102987117
  - 3ce43945d8a5
  - 64485b0e2e5c
  - 0e134fd4240a
  - 99a134a4e4ee
  - 085f97a10c74
  - 2118343922d0
  - 5cfa24551bed
  - fe7e1e86c3ff
  - 570d92e39533
  - 65400bbaf663
  delta: '0:00:00.188799'
  end: '2020-09-14 20:55:20.279985'
  msg: non-zero return code
  rc: 1
  start: '2020-09-14 20:55:20.091186'
  stderr: 'Error: No such object: 239c8895255c'
  stderr_lines: <omitted>
  stdout: '[]'
  stdout_lines: <omitted> 
  end: '2020-09-14 20:55:20.279985'
  start: '2020-09-14 20:55:20.091186'
2020-09-14 20:55:20,701 p=1957 u=root |  NO MORE HOSTS LEFT

Resolution

Step 1. Stop and disable all the obsolete osd service on the affected osd node osd-136:

systemctl stop ceph-osd@105.service
systemctl disable ceph-osd@105.service

Step 2. Remove all the obsolete osds' mount point in /var/lib/ceph/osd if exist:

rm -rf  /var/lib/ceph/osd/ceph-105

Step 3. Remove the systemd service file if exists

ls -la /etc/systemd/system/multi-user.target.wants/ceph-osd@105.service
rm -f /etc/systemd/system/multi-user.target.wants/ceph-osd@105.service

Root Cause

Some osds were not removed completely on the affected osd node osd-136 before, so the systemctl keep restarting those abnormal osd services, and the container ID always change:

Sep 15 14:19:03 osd-136 docker: Error response from daemon: No such container: ceph-osd-105
Sep 15 14:19:03 osd-136 systemd: Started Ceph OSD.
Sep 15 14:19:03 osd-136 dockerd-current: time="2020-09-15T14:19:03.632213247+08:00" level=error msg="Handler for DELETE /v1.26/containers/ceph-osd-113?force=1 returned error: No such container: ceph-osd-113"
Sep 15 14:19:03 osd-136 dockerd-current: time="2020-09-15T14:19:03.632424808+08:00" level=error msg="Handler for DELETE /v1.26/containers/ceph-osd-113 returned error: No such container: ceph-osd-113"
Sep 15 14:19:03 osd-136 docker: Error response from daemon: No such container: ceph-osd-113
Sep 15 14:19:03 osd-136 systemd: Started Ceph OSD.
Sep 15 14:19:03 osd-136 systemd: Started libcontainer container 71c39b415ce5387d3958889e903726722a2a90a3dee8d35c42eba230f502af5e.
Sep 15 14:19:03 osd-136 systemd: Started libcontainer container 9588d2fd8f7c037316aec53ff9db704966ce81e3e90e8463188adc735bfd006d.
Sep 15 14:19:03 osd-136 systemd: Started libcontainer container 2cf31bd41805600386255380e8baf534717f84efabfa9912a889a5fc9c9df823.
Sep 15 14:19:14 osd-136 dracut: dracut-033-568.el7
Sep 15 14:19:14 osd-136 dracut: Executing: /usr/sbin/dracut --list-modules
Sep 15 14:19:25 osd-136 dbus[2016]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service'
Sep 15 14:19:25 osd-136 systemd: Starting Hostname Service...
Sep 15 14:19:25 osd-136 dbus[2016]: [system] Successfully activated service 'org.freedesktop.hostname1'
Sep 15 14:19:25 osd-136 systemd: Started Hostname Service.
Sep 15 14:19:33 osd-136 ceph-osd-run.sh: grep: /etc/ceph/osd/*.json: No such file or directory
Sep 15 14:19:33 osd-136 dockerd-current: time="2020-09-15T14:19:33.847052998+08:00" level=warning msg="71c39b415ce5387d3958889e903726722a2a90a3dee8d35c42eba230f502af5e cleanup: failed to unmount secrets: invalid argument"
Sep 15 14:19:34 osd-136 ceph-osd-run.sh: grep: /etc/ceph/osd/*.json: No such file or directory
Sep 15 14:19:34 osd-136 ceph-osd-run.sh: grep: /etc/ceph/osd/*.json: No such file or directory
Sep 15 14:19:35 osd-136 dockerd-current: time="2020-09-15T14:19:35.333033276+08:00" level=warning msg="2cf31bd41805600386255380e8baf534717f84efabfa9912a889a5fc9c9df823 cleanup: failed to unmount secrets: invalid argument"
Sep 15 14:19:35 osd-136 dockerd-current: time="2020-09-15T14:19:35.375977022+08:00" level=warning msg="9588d2fd8f7c037316aec53ff9db704966ce81e3e90e8463188adc735bfd006d cleanup: failed to unmount secrets: invalid argument"
Sep 15 14:19:35 osd-136 systemd: ceph-osd@129.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 15 14:19:35 osd-136 systemd: Unit ceph-osd@129.service entered failed state.
Sep 15 14:19:35 osd-136 systemd: ceph-osd@129.service failed.
Sep 15 14:19:35 osd-136 systemd: ceph-osd@113.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 15 14:19:35 osd-136 systemd: Unit ceph-osd@113.service entered failed state.
Sep 15 14:19:35 osd-136 systemd: ceph-osd@113.service failed.
Sep 15 14:19:35 osd-136 systemd: ceph-osd@105.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 15 14:19:35 osd-136 systemd: Unit ceph-osd@105.service entered failed state.
Sep 15 14:19:35 osd-136 systemd: ceph-osd@105.service failed.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments