Why was my node declared "undead" before it was evicted from the cluster in RHEL 5 or 6

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 5 with the High Availability Add On
  • Red Hat Enterprise Linux (RHEL) 6 with the High Availability Add On
  • Cluster configured to use a quorum device (<quorumd> in /etc/cluster/cluster.conf)

Issue

  • One of the nodes in my cluster was fenced. On one of the other nodes I see the following messages:
qdiskd[57436]: Node 1 is undead.
qdiskd[57436]: Writing eviction notice (again) for node 1
qdiskd[57436]: Node 1 evicted
  • Why was Node 1 declared undead before it was evicted?

Resolution

Resolve whatever issue caused the node to be evicted in the first place. The fact that the node was reported as "undead" before eviction is not unexpected (see Root Cause below).

Root Cause

Here's the sequence of events:

  • Node 2 notices node 1 has missed more than tko cycles and so marks it as dead in its own internal state. However, we're not a master so we don't actually evict it or print any messages at this time:

            /*
               Case 3:  Check for node who is supposed to be dead, but
               has started writing to the disk again with the same
               incarnation.  
    
               Transition from Offline -> Undead (BAD!!!)
             */
            if (ni[x].ni_misses > ctx->qc_tko &&
                 state_run(ni[x].ni_status.ps_state)) {
    
                    /*
                       Mark our internal views as dead if nodes miss too
                       many heartbeats...  This will cause a master
                       transition if no live master exists.
                     */
                    if (ni[x].ni_status.ps_state >= S_RUN &&
                        ni[x].ni_seen) {
                            logt_print(LOG_DEBUG, "Node %d DOWN\n",
                                   ni[x].ni_status.ps_nodeid);
                            ni[x].ni_seen = 0;      
                    }
    
                    ni[x].ni_state = S_EVICT;
                    ni[x].ni_status.ps_state = S_EVICT;
                    ni[x].ni_evil_incarnation = 
                            ni[x].ni_incarnation;
    
                    /*
                       Write eviction notice if we're the master.
                     */
                    if (ctx->qc_status == S_MASTER) {
                            logt_print(LOG_NOTICE,
                                   "Writing eviction notice for node %d\n",
                                   ni[x].ni_status.ps_nodeid);
                            qd_write_status(ctx, ni[x].ni_status.ps_nodeid,
                                            S_EVICT, NULL, NULL, NULL);
                            if (ctx->qc_flags & RF_ALLOW_KILL) {
                                    logt_print(LOG_DEBUG, "Telling CMAN to "
                                            "kill the node\n");
                                    cman_kill_node(ctx->qc_cman_admin,
                                            ni[x].ni_status.ps_nodeid);
                            }
                    }
    
                    /* Clear our master mask for the node after eviction */
                    if (mask)
                            clear_bit(mask, (ni[x].ni_status.ps_nodeid-1),
                                      sizeof(memb_mask_t));
                    continue;
            }
    

Now that node's ni_evil_incarnation is set, and we proceed to the next cycle. Somewhere before the next time we check for transitions, node 1 has written to the disk again. It was never evicted, so this is not unexpected, if whatever was blocking it in the first place has finished. However, since we've already evicted it in our internal state, we report that the node is undead:

            /*
               Case 2: Check for a heartbeat timeout.  Write an eviction
               notice if we're the master.  If this is our first notice
               of the heartbeat timeout, update our internal state
               accordingly.  When the master evicts this node, we will
               hit case 1 above.

               Transition from Online -> Evicted
             */
            if (ni[x].ni_evil_incarnation &&
                (ni[x].ni_evil_incarnation == 
                 ni[x].ni_status.ps_incarnation) &&
                (ni[x].ni_status.ps_updatenode ==
                 ni[x].ni_status.ps_nodeid)) {
                    logt_print(LOG_CRIT, "Node %d is undead.\n",
                           ni[x].ni_status.ps_nodeid);

                    logt_print(LOG_ALERT,
                           "Writing eviction notice (again) for node %d\n",
                           ni[x].ni_status.ps_nodeid);
                    qd_write_status(ctx, ni[x].ni_status.ps_nodeid,
                                    S_EVICT, NULL, NULL, NULL);
                    ni[x].ni_status.ps_state = S_EVICT;

                    /* XXX Need to fence it again */
                    if (ctx->qc_flags & RF_ALLOW_KILL) {
                            logt_print(LOG_DEBUG, "Telling CMAN to "
                                    "kill the node\n");
                            cman_kill_node(ctx->qc_cman_admin,
                                    ni[x].ni_status.ps_nodeid);
                    }
                    continue;
            }

As you can see, this would result in a cman_kill, which would result in that node dying, and causing us to bid for master and take over. When the node first came back from the dead and updated its state, it also caused us to once again see this node as thinking it is online, and se we go back to case 2 again (above). This time we're master, so we do officially evict it.

In summary, it is not abnormal to see a "Node X is undead" message on a non-master before it gets to the point of becoming master and being able to properly evict the node. This is simply a sign that the node in question failed to update the quorum device for tko*interval seconds, but then shortly after started updating the disk again.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments