Ceph - OSD will not start due a keyring conflict or to it missing in 'ceph auth list'

Solution Verified - Updated -

Environment

  • Red Hat Ceph Storage

Issue

  • An OSD that was once in the cluster is not able to start and be a part of the cluster because:
  • It is missing from ''ceph auth list'.
  • Or it has a different keyring in 'ceph auth list' then the OSD nodes states it should be in '/var/lib/ceph/osd/ceph-/keyring'

  • NOTE: This same concept and process can be applied if the OSD will not start due to 'ceph auth list' showing a different keyring for this OSD, then the OSD node states it should be. Read "Resolution" section below to determine if this is the case**

Resolution

  • The general method is the following:
  1. Export any current OSD's caps to a file.
  2. Change the file name to the name of the OSD, and update this file.
  3. Import it in the cluster with the updated information.

Example following the method mentioned above:
- From a Ceph MON export an existing OSD caps to a file (In this example osd.9 keyring is missing from 'ceph auth list' or is incorrect so we will resolve this by using existing osd.6):

# ceph auth export osd.6 -o osd.6.export
  • Move the file to a new name "osd.9.export"

  • Navigate to this OSD node and cat the keyring for this OSD. We will copy this keyring to be placed in our now existing osd.9.export file.

# cat /var/lib/ceph/osd/ceph-9/keyring 
[osd.9]
         key = AQARFndbOFpIABAArBobpqDZB8jmwH5QEwmP8g==
  • Open the osd.9.export file in a text editor, and modify the contents to be the correct OSD in [osd.9] section, and the key from the OSD node output above:
[osd.9]
    key = AQARFndbOFpIABAArBobpqDZB8jmwH5QEwmP8g==
    caps mgr = "allow profile osd"
    caps mon = "allow profile osd"
    caps osd = "allow *"
  • Now the osd.9.export should reflect the key from the OSD node and is now updated to be [osd.9], so we can now import it and it will update with the contents of the file.
# ceph auth import -i osd.9.export
imported keyring
  • verify with 'ceph auth list'

Root Cause

  • In Ceph the OSDs must be able to properly authenticate with the cluster in order to be a part of the cluster and serve data.
  • The OSD is not able to authenticate if the OSD keyring is not available in 'ceph auth list' command output, or if it is different then the keyring located on the OSD:
# cat /var/lib/ceph/osd/ceph-<id>/keyring

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.