openshift-master server won't start because of corrupted etcd wal file.

Solution Unverified - Updated -

Issue

Openshift does not start properly occuring following error messages :

master.example.com systemd[1]: openshift-master[33889]: 2015/07/13 13:35:03 etcdserver: read wal error: unexpected EOF
master.example.com systemd[1]: openshift-master.service: main process exited, code=exited, status=1/FAILURE

It seems that file system is full and /var/lib/openshift/openshift.local.etcd/member/wal/ gets corrupt.

How to recover openshift master?

  • Why do we see below messages when restarting the master :
janv. 06 11:10:14 masterv3ft.acs.altran.com openshift[9584]: loaded cluster information from store: <nil>
janv. 06 11:10:14 masterv3ft.acs.altran.com openshift[9584]: read wal error (walpb: crc mismatch) and cannot be repaired
janv. 06 11:10:14 masterv3ft.acs.altran.com systemd[1]: atomic-openshift-master.service: main process exited, code=exited, status=1/FAILURE
janv. 06 11:10:14 masterv3ft.acs.altran.com systemd[1]: Failed to start Atomic OpenShift Master.

Environment

  • OpenShift Enterprise 3.0.1

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.