5.26. cluster and gfs2-utils
- A race condition existed when a node lost contact with the quorum device at the same time as the token timeout period expired. The nodes raced to fence, which could lead to a cluster failure. To prevent the race condition from occurring, the cman and
qdiskdinteraction timer has been improved.
- Previously, a cluster partition and merge during startup fencing was not detected correctly. As a consequence, the DLM (Distributed Lock Manager) lockspace operations could become unresponsive. With this update, the partition and merge event is now detected and handled properly. DLM lockspace operations no longer become unresponsive in the described scenario.
pingcommand examples on the qdisk(5) manual page did not include the
-woption. If the
pingcommand is run without the option, the action can timeout. With this update, the
-woption has been added to those
- Due to a bug in libgfs2, sentinel directory entries were counted as if they were real entries. As a consequence, the mkfs.gfs2 utility created file systems which did not pass the fsck check when a large number of journal metadata blocks were required (for example, a file system with block size of 512, and 9 or more journals). With this update, incrementing the count of the directory entry is now avoided when dealing with sentinel entries.
GFS2file systems created with large numbers of journal metadata blocks now pass the fsck check cleanly.
- When a node fails and gets fenced, the node is usually rebooted and joins the cluster with a fresh state. However, if a block occurs during the rejoin operation, the node cannot rejoin the cluster and the attempt fails during boot. Previously, in such a case, the cman init script did not revert actions that had happened during startup and some daemons could be erroneously left running on a node. The underlying source code has been modified so that the cman init script now performs a full rollback when errors are encountered. No daemons are left running unnecessarily in this scenario.
- The RELAX NG schema used to validate the cluster.conf file previously did not recognize the
totem.miss_count_constconstant as a valid option. As a consequence, users were not able to validate
cluster.confwhen this option was in use. This option is now recognized correctly by the RELAX NG schema, and the
cluster.conffile can be validated as expected.
cmannotifyddaemon is often started after the cman utility, which means that
cmannotifyddoes not receive or dispatch any notifications on the current cluster status at startup. This update modifies the cman connection loop to generate a notification that the configuration and membership have changed.
- Incorrect use of the
free()function in the gfs2_edit code could lead to memory leaks and so cause various problems. For example, when the user executed the
gfs2_edit savemetacommand, the gfs2_edit utility could become unresponsive or even terminate unexpectedly. This update applies multiple upstream patches so that the
free()function is now used correctly and memory leaks no longer occur. With this update, save statistics for the
gfs2_edit savemetacommand are now reported more often so that users know that the process is still running when saving a large dinode with a huge amount of metadata.
- Previously, the gfs2_grow utility failed to expand a GFS file system if the file system contained only one resource group. This was due to the old code being based on
GFS1(which had different fields) that calculated distances between resource groups and did not work with only one resource group. This update adds the
rgrp_size()function in libgfs2, which calculates the size of the resource group instead of determining its distance from the previous resource group. A file system with only one resource group can now be expanded successfully.
- Previously, the gfs2_edit utility printed unclear error messages when the underlying device did not contain a valid GFS2 file system, which could be confusing. With this update, users are provided with additional information in the aforementioned scenario.
- Previously, the mkfs utility provided users with insufficient error messages when creating a
GFS2file system. The messages also contained absolute build paths and source code references, which was unwanted. A patch has been applied to provide users with comprehensive error messages in the described scenario.
gfs_controlddaemon ignored an error returned by the
dlm_controlddaemon for the
dlmc_fs_register()function while mounting a file system. This resulted in a successful mount, but recovery of a
GFSfile system could not be coordinated using Distributed Lock Manager (DLM). With this update, mounting a file system is not successful under these circumstances and an error message is returned instead.
- BZ#675723, BZ#803510
- The gfs2_convert utility can be used on a
GFS1file system to convert a file system from
GFS2. However, the gfs2_convert utility required the user to run the gfs_fsck utility prior to conversion, but because this tool is not included in Red Hat Enterprise Linux 6, users had to use Red Hat Enterprise Linux 5 to run this utility. With this update, the gfs2_fsck utility now allows users to perform a complete
GFS2conversion on Red Hat Enterprise Linux 6 systems.
- Cluster tuning using the
qdiskddaemon and the device-mapper-multipath utility is a very complex operation, and it was previously easy to misconfigure
qdiskdin this setup, which could consequently lead to a cluster nodes failure. Input and output operations of the
qdiskddaemon have been improved to automatically detect multipath-related timeouts without requiring manual configuration. Users can now easily deploy
- BZ#733298, BZ#740552
- Previously, the cman utility was not able to configure Redundant Ring Protocol (RRP) correctly in corosync, resulting in RRP deployments not working propely. With this update, cman has been improved to configure RRP properly and to perform extra sanity checks on user configurations. It is now easier to deploy a cluster with RRP and the user is provided with more extensive error reports.
- With this update, Red Hat Enterprise Linux High Availability has been validated against the VMware vSphere 5.0 release.
- With this update, the fence_scsi fencing agent has been validated for use in a two-node cluster with High Availability LVM (HA-LVM).