openshift-master server won't start because of corrupted etcd wal file.
Issue
Openshift does not start properly occuring following error messages :
master.example.com systemd[1]: openshift-master[33889]: 2015/07/13 13:35:03 etcdserver: read wal error: unexpected EOF
master.example.com systemd[1]: openshift-master.service: main process exited, code=exited, status=1/FAILURE
It seems that file system is full and /var/lib/openshift/openshift.local.etcd/member/wal/ gets corrupt.
How to recover openshift master?
- Why do we see below messages when restarting the master :
janv. 06 11:10:14 masterv3ft.acs.altran.com openshift[9584]: loaded cluster information from store: <nil>
janv. 06 11:10:14 masterv3ft.acs.altran.com openshift[9584]: read wal error (walpb: crc mismatch) and cannot be repaired
janv. 06 11:10:14 masterv3ft.acs.altran.com systemd[1]: atomic-openshift-master.service: main process exited, code=exited, status=1/FAILURE
janv. 06 11:10:14 masterv3ft.acs.altran.com systemd[1]: Failed to start Atomic OpenShift Master.
Environment
- OpenShift Enterprise 3.0.1
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.