A degraded ceph cluster (on Firefly) stops recovering and gets stuck degraded PGs after an OSD goes down, why?
Issue
-
A degraded ceph cluster (on Firefly) stops recovering and gets stuck degraded PGs after an OSD goes down, why?
-
After removing a failed OSD on a three node Ceph cluster, the data movement/balance started between the existing OSDs, but stalled. This causes the Ceph cluster to get stuck with degraded PGs.
-
The 'osd_pool_default_size' is set to 3 and 'osd_pool_default_min_size' to 2.
-
A 'ceph -s' shows the following:
# ceph -s
cluster 16ce9ce1-aa5f-445f-b994-5699730f364a
health HEALTH_WARN 326 pgs degraded; 366 pgs stuck unclean; recovery 975/83301 objects degraded (1.170%)
monmap e1: 3 mons at {mon-01=172.28.225.72:6789/0,mon-02=172.28.225.73:6789/0,mon-03=172.28.225.74:6789/0}, election epoch 18, quorum 0,1,2 mon-01,mon-02,mon-03
osdmap e540: 29 osds: 29 up, 29 in
pgmap v2727568: 9408 pgs, 19 pools, 135 GB data, 27767 objects
403 GB used, 80437 GB / 80840 GB avail
975/83301 objects degraded (1.170%)
9042 active+clean
326 active+degraded
40 active+remapped
client io 20363 B/s wr, 1 op/s
-
The above is the current state, and there is no more recovery occurring.
-
A 'ceph osd tree' shows:
# ceph osd tree
# id weight type name up/down reweight
-1 81.6 root default
-2 27.2 host node-c01
0 2.72 osd.0 DNE
1 2.72 osd.1 up 1
2 2.72 osd.2 up 1
3 2.72 osd.3 up 1
4 2.72 osd.4 up 1
5 2.72 osd.5 up 1
6 2.72 osd.6 up 1
7 2.72 osd.7 up 1
8 2.72 osd.8 up 1
9 2.72 osd.9 up 1
-3 27.2 host node-02
10 2.72 osd.10 up 1
11 2.72 osd.11 up 1
12 2.72 osd.12 up 1
13 2.72 osd.13 up 1
14 2.72 osd.14 up 1
15 2.72 osd.15 up 1
16 2.72 osd.16 up 1
17 2.72 osd.17 up 1
18 2.72 osd.18 up 1
19 2.72 osd.19 up 1
-4 27.2 host node3-03
20 2.72 osd.20 up 1
21 2.72 osd.21 up 1
22 2.72 osd.22 up 1
23 2.72 osd.23 up 1
24 2.72 osd.24 up 1
25 2.72 osd.25 up 1
26 2.72 osd.26 up 1
27 2.72 osd.27 up 1
28 2.72 osd.28 up 1
29 2.72 osd.29 up 1
Environment
-
Red Hat Ceph Enterprise 1.2.3
-
Inktank Ceph Enterprise 1.2
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
