Ceph - After adding new OSDs to a Ceph cluster, it fails to reach a HEALTH_OK state

Solution Verified - Updated -

Issue

  • New OSDs were added into an existing Ceph cluster and several of the placement groups failed to re-balance and recover. This lead the cluster to flagging a HEALTH_WARN state and several PGs are stuck in a degraded state.
cluster xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
     health HEALTH_WARN
            2 pgs degraded
            2 pgs stuck degraded
            4 pgs stuck unclean
            2 pgs stuck undersized
            2 pgs undersized
            recovery 35/472424 objects degraded (0.007%)
            recovery 13/472424 objects misplaced (0.003%)
     monmap e3: 3 mons at {mon1=10.1.1.1:6789/0,mon2=10.1.1.2:6789/0,mon3=10.1.1.3:6789/0}
            election epoch 26, quorum 0,1,2 mon1,mon2,mon3
     osdmap e6577: 214 osds: 214 up, 214 in; 2 remapped pgs
      pgmap v8141005: 27712 pgs, 17 pools, 707 GB data, 155 kobjects
            2252 GB used, 177 TB / 179 TB avail
            35/472424 objects degraded (0.007%)
            13/472424 objects misplaced (0.003%)
               27708 active+clean
                   2 active+undersized+degraded
                   2 active+remapped
  client io 6025 B/s rd, 396 kB/s wr, 114 op/s

HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 4 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; recovery 35/472424 objects degraded (0.007%); recovery 13/472424 objects misplaced (0.003%)
pg 3.2fb is stuck unclean for 1450.234185, current state active+undersized+degraded, last acting [209,40]
pg 1.1a40 is stuck unclean for 9917.354884, current state active+remapped, last acting [152,9,35]
pg 1.18de is stuck unclean for 1454.534147, current state active+remapped, last acting [124,184,52]
pg 2.150 is stuck unclean for 1453.461673, current state active+undersized+degraded, last acting [183,127]
pg 3.2fb is stuck undersized for 667.477688, current state active+undersized+degraded, last acting [209,40]
pg 2.150 is stuck undersized for 1453.436227, current state active+undersized+degraded, last acting [183,127]
pg 3.2fb is stuck degraded for 667.478426, current state active+undersized+degraded, last acting [209,40]
pg 2.150 is stuck degraded for 1453.436964, current state active+undersized+degraded, last acting [183,127]
pg 3.2fb is active+undersized+degraded, acting [209,40]
pg 2.150 is active+undersized+degraded, acting [183,127]
recovery 35/472424 objects degraded (0.007%)
recovery 13/472424 objects misplaced (0.003%)

Environment

  • Red Hat Enterprise Linux 7
  • Red Hat Ceph Storage 1.3.x
  • Red Hat Ceph Storage 2.x
  • Red Hat Ceph Storage 3.x

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In