5.26. cluster and gfs2-utils

Updated cluster and gfs2-utils packages that fix multiple bugs and add various enhancements are now available for Red Hat Enterprise Linux 6.
The cluster and gfs2-utils packages contain the core clustering libraries for Red Hat High Availability as well as utilities to maintain GFS2 file systems for users of Red Hat Resilient Storage.

Bug Fixes

A race condition existed when a node lost contact with the quorum device at the same time as the token timeout period expired. The nodes raced to fence, which could lead to a cluster failure. To prevent the race condition from occurring, the cman and qdiskd interaction timer has been improved.
Previously, a cluster partition and merge during startup fencing was not detected correctly. As a consequence, the DLM (Distributed Lock Manager) lockspace operations could become unresponsive. With this update, the partition and merge event is now detected and handled properly. DLM lockspace operations no longer become unresponsive in the described scenario.
Multiple ping command examples on the qdisk(5) manual page did not include the -w option. If the ping command is run without the option, the action can timeout. With this update, the -w option has been added to those ping commands.
Due to a bug in libgfs2, sentinel directory entries were counted as if they were real entries. As a consequence, the mkfs.gfs2 utility created file systems which did not pass the fsck check when a large number of journal metadata blocks were required (for example, a file system with block size of 512, and 9 or more journals). With this update, incrementing the count of the directory entry is now avoided when dealing with sentinel entries. GFS2 file systems created with large numbers of journal metadata blocks now pass the fsck check cleanly.
When a node fails and gets fenced, the node is usually rebooted and joins the cluster with a fresh state. However, if a block occurs during the rejoin operation, the node cannot rejoin the cluster and the attempt fails during boot. Previously, in such a case, the cman init script did not revert actions that had happened during startup and some daemons could be erroneously left running on a node. The underlying source code has been modified so that the cman init script now performs a full rollback when errors are encountered. No daemons are left running unnecessarily in this scenario.
The RELAX NG schema used to validate the cluster.conf file previously did not recognize the totem.miss_count_const constant as a valid option. As a consequence, users were not able to validate cluster.conf when this option was in use. This option is now recognized correctly by the RELAX NG schema, and the cluster.conf file can be validated as expected.
The cmannotifyd daemon is often started after the cman utility, which means that cmannotifyd does not receive or dispatch any notifications on the current cluster status at startup. This update modifies the cman connection loop to generate a notification that the configuration and membership have changed.
Incorrect use of the free() function in the gfs2_edit code could lead to memory leaks and so cause various problems. For example, when the user executed the gfs2_edit savemeta command, the gfs2_edit utility could become unresponsive or even terminate unexpectedly. This update applies multiple upstream patches so that the free() function is now used correctly and memory leaks no longer occur. With this update, save statistics for the gfs2_edit savemeta command are now reported more often so that users know that the process is still running when saving a large dinode with a huge amount of metadata.
Previously, the gfs2_grow utility failed to expand a GFS file system if the file system contained only one resource group. This was due to the old code being based on GFS1 (which had different fields) that calculated distances between resource groups and did not work with only one resource group. This update adds the rgrp_size() function in libgfs2, which calculates the size of the resource group instead of determining its distance from the previous resource group. A file system with only one resource group can now be expanded successfully.
Previously, the gfs2_edit utility printed unclear error messages when the underlying device did not contain a valid GFS2 file system, which could be confusing. With this update, users are provided with additional information in the aforementioned scenario.
Previously, the mkfs utility provided users with insufficient error messages when creating a GFS2 file system. The messages also contained absolute build paths and source code references, which was unwanted. A patch has been applied to provide users with comprehensive error messages in the described scenario.
The gfs_controld daemon ignored an error returned by the dlm_controld daemon for the dlmc_fs_register() function while mounting a file system. This resulted in a successful mount, but recovery of a GFS file system could not be coordinated using Distributed Lock Manager (DLM). With this update, mounting a file system is not successful under these circumstances and an error message is returned instead.


BZ#675723, BZ#803510
The gfs2_convert utility can be used on a GFS1 file system to convert a file system from GFS1 to GFS2. However, the gfs2_convert utility required the user to run the gfs_fsck utility prior to conversion, but because this tool is not included in Red Hat Enterprise Linux 6, users had to use Red Hat Enterprise Linux 5 to run this utility. With this update, the gfs2_fsck utility now allows users to perform a complete GFS1 to GFS2 conversion on Red Hat Enterprise Linux 6 systems.
Cluster tuning using the qdiskd daemon and the device-mapper-multipath utility is a very complex operation, and it was previously easy to misconfigure qdiskd in this setup, which could consequently lead to a cluster nodes failure. Input and output operations of the qdiskd daemon have been improved to automatically detect multipath-related timeouts without requiring manual configuration. Users can now easily deploy qdiskd with device-mapper-multipath.
BZ#733298, BZ#740552
Previously, the cman utility was not able to configure Redundant Ring Protocol (RRP) correctly in corosync, resulting in RRP deployments not working propely. With this update, cman has been improved to configure RRP properly and to perform extra sanity checks on user configurations. It is now easier to deploy a cluster with RRP and the user is provided with more extensive error reports.
With this update, Red Hat Enterprise Linux High Availability has been validated against the VMware vSphere 5.0 release.
With this update, the fence_scsi fencing agent has been validated for use in a two-node cluster with High Availability LVM (HA-LVM).
All users of cluster and gfs2-utils are advised to upgrade to these updated package, which fix these bugs and add these enhancements.