etcd fails to start with state.commit out of range
Issue
etcd member fails to start with a raft panic, like the following.
etcd: restarting member 123456 in cluster 987654321 at commit index 3382501183
etcd: 123456 state.commit 3382501183 is out of range [3382473101, 3382485737]
bash: panic: 123456 state.commit 3382501183 is out of range [3382473101, 3382485737]
bash: goroutine 1 [running]:
bash: panic(0xcbfda0, 0xc423da31b0)
bash: /usr/lib/golang/src/runtime/panic.go:500 +0x1a1
bash: github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc4201311c0, 0xe666d2, 0x2b, 0xc4249fdd80, 0x4, 0x4)
bash: /builddir/build/BUILD/etcd-21fdcc6443f3267111051240f0eca839acc96a39/Godeps/_workspace/src/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x16a
bash: github.com/coreos/etcd/raft.(*raft).loadState(0xc4200d44b0, 0x2ff, 0x0, 0xc99cdf3f, 0x0, 0x0, 0x0)
bash: /builddir/build/BUILD/etcd-21fdcc6443f3267111051240f0eca839acc96a39/src/github.com/coreos/etcd/raft/raft.go:1196 +0x1db
bash: github.com/coreos/etcd/raft.newRaft(0xc42004e728, 0xe0000)
bash: /builddir/build/BUILD/etcd-21fdcc6443f3267111051240f0eca839acc96a39/src/github.com/coreos/etcd/raft/raft.go:303 +0xca5
bash: github.com/coreos/etcd/raft.RestartNode(0xc42004e728, 0xc42649a000, 0x315c)
bash: /builddir/build/BUILD/etcd-21fdcc6443f3267111051240f0eca839acc96a39/src/github.com/coreos/etcd/raft/node.go:215 +0x43
bash: github.com/coreos/etcd/etcdserver.restartNode(0xc420097b80, 0xc4201c2090, 0x29, 0xc42004eae0, 0x1, 0x1, 0x0, 0x0)
bash: /builddir/build/BUILD/etcd-21fdcc6443f3267111051240f0eca839acc96a39/src/github.com/coreos/etcd/etcdserver/raft.go:401 +0x5a1
bash: github.com/coreos/etcd/etcdserver.NewServer(0xc420097b80, 0x0, 0x0, 0x0)
bash: /builddir/build/BUILD/etcd-21fdcc6443f3267111051240f0eca839acc96a39/src/github.com/coreos/etcd/etcdserver/server.go:383 +0x29d5
bash: github.com/coreos/etcd/embed.StartEtcd(0xc420119180, 0xc4200d3080, 0x0, 0x0)
bash: /builddir/build/BUILD/etcd-21fdcc6443f3267111051240f0eca839acc96a39/src/github.com/coreos/etcd/embed/etcd.go:123 +0x67d
bash: github.com/coreos/etcd/etcdmain.startEtcd(0xc420119180, 0x6, 0xe44852, 0x6, 0x1)
bash: /builddir/build/BUILD/etcd-21fdcc6443f3267111051240f0eca839acc96a39/src/github.com/coreos/etcd/etcdmain/etcd.go:187 +0x47
bash: github.com/coreos/etcd/etcdmain.startEtcdOrProxyV2()
bash: /builddir/build/BUILD/etcd-21fdcc6443f3267111051240f0eca839acc96a39/src/github.com/coreos/etcd/etcdmain/etcd.go:104 +0x14e0
bash: github.com/coreos/etcd/etcdmain.Main()
bash: /builddir/build/BUILD/etcd-21fdcc6443f3267111051240f0eca839acc96a39/src/github.com/coreos/etcd/etcdmain/main.go:39 +0x61
bash: main.main()
bash: /builddir/build/BUILD/etcd-21fdcc6443f3267111051240f0eca839acc96a39/src/github.com/coreos/etcd/main.go:28 +0x14
systemd: etcd.service: main process exited, code=exited, status=2/INVALIDARGUMENT
systemd: Failed to start Etcd Server.
systemd: Unit etcd.service entered failed state.
systemd: etcd.service failed.
The important bit of the panic is:
etcd: restarting member 123456 in cluster 987654321 at commit index 3382501183
etcd: 123456 state.commit 3382501183 is out of range [3382473101, 3382485737]
bash: panic: 123456 state.commit 3382501183 is out of range [3382473101, 3382485737]
This means the etcd member, identified by hash 123456, has a state.commit not in range with the cluster's actual commit. Because of that raft doesn't allow the member to join the cluster to avoid data corruption, and etcd fails to start with a panic.
Environment
- etcd 3.1
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.