Why are Brick Processes Not Starting After an OCS 3.x Node Reboot?
Issue
-
After a restart of an Openshift Container Storage node, the bricks in the Gluster pod running in this node, are not starting. The output of the
gluster v status
command shows the bricks hosted in the rebooted node asN/A
. Taking the volumeheketidbstorage
as an example, this is the status observed after rebooting OCS node 10.0.0.1:sh-4.2# gluster volume status heketidbstorage Status of volume: heketidbstorage Gluster process TCP Port RDMA Port Online PID ------------------------------------------------------------------------------ Brick 10.0.0.1:/var/lib/heketi/mounts/vg_XXXXXXXX/brick_XXXXXXXXX/brick N/A N/A N N/A Brick 10.0.0.2:/var/lib/heketi/mounts/vg_XXXXXXXX/brick_XXXXXXXXX/brick 49152 0 Y 251 Brick 10.0.0.3:/var/lib/heketi/mounts/vg_XXXXXXXX/brick_XXXXXXXXX/brick 49152 0 Y 212
The same thing in happening for the rest of the volumes.
-
Reviewing the output of a
ps
command in the pod, the matchingglusterfsd
processes are not running. - As a workaround, restarting
glusterd
brings the bricks back online:systemctl restart glusterd
. - How to reboot OCS nodes without any manual intervention afterwards, to get the bricks automatically online?
Environment
- Openshift Container Storage 3.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.