A cluster node is fenced or stops responding briefly during a large cache flush by the kernel in RHEL 5

Solution In Progress - Updated 2024-08-05T07:18:57+00:00 -

Issue

Cluster node was rebooted unexpectedly and we see a large drop of cached memory during/before it happens
There's a token loss in the cluster and the affected node still seems to be alive and logging, but just shows DLM connect messages over and over and nothing from openais in /var/log/messages:

Apr  6 12:48:33 node1 kernel: dlm: connecting to 3
Apr  6 12:48:33 node1 kernel: dlm: connecting to 1
Apr  6 12:48:34 node1 last message repeated 3 times
Apr  6 12:48:34 node1 kernel: dlm: connecting to 4
Apr  6 12:48:35 node1 kernel: dlm: connecting to 1
Apr  6 12:48:35 node1 kernel: dlm: connecting to 4
Apr  6 12:48:35 node1 last message repeated 2 times
Apr  6 12:48:35 node1 kernel: dlm: connecting to 1
Apr  6 12:48:35 node1 kernel: dlm: connecting to 4
Apr  6 12:48:35 node1 last message repeated 2 times
Apr  6 12:48:35 node1 kernel: dlm: connecting to 1
[...]

A node stops sending its token when aisexec seems to be using close to 100% of CPU, and the node doesn't seem to process any membership changes or send its messages while the other nodes are recognizing the token loss and taking action to remove that node from the cluster.

Environment

Red Hat Enterprise Linux (RHEL) 5 with the Resilient Storage Add On
Data shows a large drop in cached data in vmstat, /proc/meminfo or other sources just leading up to the unresponsiveness of the node
/proc/meminfo shows a large amount of "Dirty" data just prior to the cache flush. A large amount might be several 10s of Gb

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

A cluster node is fenced or stops responding briefly during a large cache flush by the kernel in RHEL 5

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links