Ceph - After adding new OSDs to a Ceph cluster, it fails to reach a HEALTH_OK state
Issue
- New OSDs were added into an existing Ceph cluster and several of the placement groups failed to re-balance and recover. This lead the cluster to flagging a HEALTH_WARN state and several PGs are stuck in a degraded state.
cluster xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
health HEALTH_WARN
2 pgs degraded
2 pgs stuck degraded
4 pgs stuck unclean
2 pgs stuck undersized
2 pgs undersized
recovery 35/472424 objects degraded (0.007%)
recovery 13/472424 objects misplaced (0.003%)
monmap e3: 3 mons at {mon1=10.1.1.1:6789/0,mon2=10.1.1.2:6789/0,mon3=10.1.1.3:6789/0}
election epoch 26, quorum 0,1,2 mon1,mon2,mon3
osdmap e6577: 214 osds: 214 up, 214 in; 2 remapped pgs
pgmap v8141005: 27712 pgs, 17 pools, 707 GB data, 155 kobjects
2252 GB used, 177 TB / 179 TB avail
35/472424 objects degraded (0.007%)
13/472424 objects misplaced (0.003%)
27708 active+clean
2 active+undersized+degraded
2 active+remapped
client io 6025 B/s rd, 396 kB/s wr, 114 op/s
HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 4 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; recovery 35/472424 objects degraded (0.007%); recovery 13/472424 objects misplaced (0.003%)
pg 3.2fb is stuck unclean for 1450.234185, current state active+undersized+degraded, last acting [209,40]
pg 1.1a40 is stuck unclean for 9917.354884, current state active+remapped, last acting [152,9,35]
pg 1.18de is stuck unclean for 1454.534147, current state active+remapped, last acting [124,184,52]
pg 2.150 is stuck unclean for 1453.461673, current state active+undersized+degraded, last acting [183,127]
pg 3.2fb is stuck undersized for 667.477688, current state active+undersized+degraded, last acting [209,40]
pg 2.150 is stuck undersized for 1453.436227, current state active+undersized+degraded, last acting [183,127]
pg 3.2fb is stuck degraded for 667.478426, current state active+undersized+degraded, last acting [209,40]
pg 2.150 is stuck degraded for 1453.436964, current state active+undersized+degraded, last acting [183,127]
pg 3.2fb is active+undersized+degraded, acting [209,40]
pg 2.150 is active+undersized+degraded, acting [183,127]
recovery 35/472424 objects degraded (0.007%)
recovery 13/472424 objects misplaced (0.003%)
Environment
- Red Hat Enterprise Linux 7
- Red Hat Ceph Storage 1.3.x
- Red Hat Ceph Storage 2.x
- Red Hat Ceph Storage 3.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.