Why is fail-over taking too long when restarting one gluster server
Issue
Fail-over is different for different types of gluster restart.
When glusterd is stop or restarted the time taken is:
real 0m0.039s
user 0m0.002s
sys 0m0.004s
For a reboot:
real 0m25.954s <<<<<=====
user 0m0.000s
sys 0m0.007s
For a halt:
real 1m22.415s <<<<<=====
user 0m0.002s
sys 0m0.009s
real 1m14.180s <<<<<=====
user 0m0.002s
sys 0m0.007s
Therefore with the node completely down it takes around 1m14s to switch to the other node.
With a reboot the delay depends on the amount of time it took to reboot.
The time to switch is the same for a down NIC.
ifdown ens3:
real 1m8.210s <<<<<=====
user 0m0.002s
sys 0m0.005s
Environment
RHGS 3.1*
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.