The Cluster failover failed due to EXT4 filesystem error

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 6.3

Issue

  • We are using Red Hat Enterprise Linux with Cluster function on Oracle 11G. When switching the Oracle service over from node A to node B, the following error messages were logged in "/var/log/message":
Oct 8 16:05:15 node2 rgmanager[24534]: [fs] mounting /dev/dm-28 on /share
Oct 8 16:05:15 node2 rgmanager[24556]: [fs] mount -t ext4 /dev/dm-28 /share
Oct 8 16:05:15 node2 kernel: EXT4-fs (dm-28): bad geometry: block count 130809856 exceeds size of device (12845056 blocks)
Oct 8 16:05:16 node2 rgmanager[24579]: [fs] 'mount -t ext4 /dev/dm-28 /share' failed, error=32
Oct 8 16:05:19 node2 rgmanager[8188]: start on fs "shared_fs" returned 1 (generic error)
Oct 8 16:05:19 node2 rgmanager[8188]: #68: Failed to start service:sharedfs; return value: 1
Oct 8 16:05:19 node2 rgmanager[8188]: Stopping service service:sharedfs

Resolution

1. Disable the cluster service on both cluster nodes by taking scheduled downtime, create a data backup from of the filesystem, ensure the LVM/Filesystem resources are unmounted from both cluster nodes and that these shares are not being used anywhere. Perform a filesystem(e2fsck) check on the filesystem to clean it.

2. It appears that an incorrect HA LVM configuration lead to this issue. Correct the HA LVM configuration on this cluster should resolve the issue. Please refer following article which outlines the steps to implement HA LVM. There are two ways to do this, using clvmd method or using LVM Tagging.

What is a Highly Available LVM (HA-LVM) configuration and how do I implement it?

3. Enable the self_fence option for filesystem resource to ensure that the service is still able to relocate by first rebooting the original node.

Diagnostic Steps

  • The service failed since it could not mount the cluster filesystem resource which is required for cluster service.

  • Following messages were also logged on another cluster node which suggests to perform a filesystem check:

Oct  8 16:17:27 node1 rgmanager[20299]: [fs] mount -t ext4  /dev/dm-34 /data1
Oct  8 16:17:27 node1 kernel: EXT4-fs (dm-34): warning: maximal mount count reached, running e2fsck is recommended
Oct  8 16:17:27 node1 kernel: EXT4-fs (dm-34): mounted filesystem with ordered data mode. Opts: 
Oct  8 16:17:27 node1 rgmanager[20516]: [fs] mount -t ext4  /dev/dm-36 /data2
Oct  8 16:17:27 node1 kernel: EXT4-fs (dm-36): warning: maximal mount count reached, running e2fsck is recommended
Oct  8 16:17:27 node1 kernel: EXT4-fs (dm-36): mounted filesystem with ordered data mode. Opts: 
  • It is recommended that all filesystem resources(fs.sh) are created on a HA-LVM device. Make sure the HA-LVM has been configured in cluster.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.