Cluster going partition without quorum

Latest response

I have created a simple two node cluster and found that the nodes are not joining. Each node forms its own partitioned cluster, mentioning the other node as UNCLEAN (offline)
RHEL Version - Red Hat Enterprise Linux Server release 7.3 Beta (Maipo)
Steps followed :
firewall-cmd --add-service=high-availability
systemctl start pcsd
pcs cluster auth rhelha1 rhelha2

pcs cluster setup --start --name abhi_cluster rhelha1 rhelha2

Please find the pcs status output from both nodes below.

[root@rhelha1 /]# pcs status
Cluster name: abhi_cluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: rhelha1 (version 1.1.15-9.el7-e174ec8) - partition WITHOUT quorum
Last updated: Thu Nov 3 08:52:44 2016 Last change: Thu Nov 3 08:27:31 2016 by hacluster via crmd on rhelha1

2 nodes and 0 resources configured

Node rhelha2: UNCLEAN (offline)
Online: [ rhelha1 ]

No resources

Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@rhelha1 /]#

[root@rhelha2 ~]# pcs status
Cluster name: abhi_cluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: rhelha2 (version 1.1.15-9.el7-e174ec8) - partition WITHOUT quorum
Last updated: Thu Nov 3 08:36:23 2016 Last change: Thu Nov 3 08:27:30 2016 by hacluster via crmd on rhelha2

2 nodes and 0 resources configured

Node rhelha1: UNCLEAN (offline)
Online: [ rhelha2 ]

No resources

Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@rhelha2 ~]#

Responses

Found the issue and it was a crazy one. I had my hosts, mapping hostname to loopback (was to troubleshoot another issue), as below 127.0.0.1 rhelha1 localhost localhost.localdomain localhost4 localhost4.localdomain4

This caused the corosync to update loopback ip for this node

[root@rhelha1 ]# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 127.0.0.1 status = ring 0 active with no faults

Changing this in /etc/hosts & restart corosync/cluster helped to solve the issue. Thanks for checking in.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.