Why does ceph commands take time to complete when one or more monitors are down?
Issue
-
On busy clusters, when one or more monitors goes down or are not accessible somehow, all ceph commands on the cluster take a bit of time to complete.
-
Timing a 'ceph health' would show the following:
# time ceph health
2014-09-19 11:09:19.280505 7fdd5c7c6700 0 -- :/1014736 >> AA.BB.CC.DD:6789/0 pipe(0x7fdd58022120 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fdd58022390).fault
2014-09-19 11:09:25.280470 7fdd5c5c4700 0 -- AA.BB.CC.DD:0/1014736 >> EE.FF.GG.HH:6789/0 pipe(0x7fdd4c001d20 sd=3 :0 s=1 pg.=0 cs=0 l=1 c=0x7fdd4c001f90).fault
HEALTH_WARN 10 pgs backfilling; 451 pgs peering; 465 pgs stuck inactive; 473 pgs stuck unclean; 41 requests are blocked > 32 sec; recovery 11272/7436241 objects degraded (0.152%); 1 mons down, quorum 0,1,3,4 boxen1,boxen2,boxen3,boxen4
real 0m9.380s
user 0m0.252s
sys 0m0.060s
Environment
-
Inktank Ceph Enterprise 1.2
-
Red Hat Ceph Enterprise 1.2.3
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.