- Previously, rgmanager inappropriately called the rg_wait_threads() function during cluster reconfiguration. This could lead to an internal deadlock in rgmanager which caused the cluster services to become unresponsive. This irrelevant call has been removed from the code and deadlocks now no longer occur during cluster reconfiguration.
- When running a Sybase database on a cluster, the cluster defines the ASEHA (Sybase Adaptive Server Enterprise with the High Availability Option) resource agents to manage the Sybase cluster resources. The
ASEHAagentresource agent previously specified all resource attributes as unique. As a consequence, it was difficult to have more than one
ASEHAagentresource present in the cluster because the Resource Group Manager ignores all resources with conflicting "unique" attributes. This update removes the unique flag from all unnecessary attributes so it is now possible to run multiple
ASEHAagentresource agents on one cluster node.
- Previously, rgmanager did not handle wildcard characters matching in the
nfsclient.shscript correctly. Therefore, rgmanager was unable to detect removal of an NFS export from the export table if there was another NFS export which matched the wildcard pattern. Consequently, rgmanger did not restart the appropriate NFS service as expected. This update corrects wildcard matching logic so that rgmanager now correctly recognizes removal of matched NFS exports and restarts the relevant NFS service.
- Previously, rgmanager inappropriately called the
rg_wait_threads()function during cluster reconfiguration. This could lead to an internal deadlock in rgmanager, which caused the cluster services to become unresponsive. This incorrect call has been removed from the code and deadlocks now no longer occur during cluster reconfiguration.
- Resource Group Manager did not properly handle service status reporting in certain situations within a multi-node cluster with a restricted failover domain defined. Consequently, if a service failover failed because there was an exclusive service running on the only suitable standby node, rgmanager reported the failed service as started on an offline node. This update modifies Resource Group Manager's event handling so a failed service is now correctly reported as stopped in this scenario.
- Resource Group Manager did not handle inter-service dependencies correctly. Therefore, if a service was dependent on another service that was running on the same cluster node, the dependent service became unresponsive during the service failover and remained in the recovering state. With this update, rgmanager has been modified to check a service state during failover and stop the service if it is dependent on the service that is failing over. Resource Group Manager then tries to start this dependent service on other nodes as expected.
- A rare race condition could occur when rgmanager received a request to start a new resource group thread while another thread was exiting. This race condition could cause a Time of Check to Time of Use (TOC/TOU) bug, which under certain circumstances resulted in an attempt to access previously-freed memory. As a consequence, rgmanager terminated unexpectedly with a segmentation fault. To avoid the TOC/TOU problem, rgmanager now checks the status of the resource group thread before attempting to use the thread. This ensures that the thread is referred to correctly and Resource Group Manager thus no longer crashes in this scenario.
- Resource Group Manager fails to stop a resource if it is located on unmounted file system. As a result of this failure, rgmanager treated the resource as missing and marked the appropriate service as failed, which prevented the cluster from recovering the service. This update allows rgmanager to ignore this error if a resource has not been previously started with a service. The service can now be properly started on a different host.
- Under certain circumstances, a
stoppedevent could be processed after a service and its dependent services had already been restarted. This forced the dependent services to be restarted erroneously. This update allows rgmanager to ignore the
stoppedevents if dependent services have already been started and the services are no longer restarted unnecessarily.
- Due to changes in the behavior of the
LVMcommands, failed devices could not be removed from a
volume group(VG) in the same way as previously. This resulted in an inability to relocate cluster services because the affected VG and
logical volumes(LVs) could not be modified while the failed device was present in the VG. This update adds an additional command that is now needed in order to remove the failed physical volume from the VG. Services running on affected LVs can now be relocated correctly.
- When running multiple
oracledbresource instances at the same time, several instances could attempt to write into a shared log file at the same moment. This caused all but one resource to fail and the log file to become corrupted. With this update, rgmanager now uses a unique log file per each
SAPDatabaseresource agent shipped with the Red Hat Enterprise Linux High Availability add-on was out of sync with the upstream version. This could cause Resource Group Manager to fail to manage
SAPinstances properly. This update applies multiple upstream patches, which provide several bug fixes and enhancements, including the following:
- The scope of the internal
rcvariable has been corrected in several internal functions.
- The Oracle recovery method has been changed from recover automatic database to end backup.
- The process search pattern has been adjusted for DB2 version 9.5.
- The Oracle
listenerservice is now started only if some database processes have been found.
evalcommand is no longer used to start a new process when unnecessary.
SAPDatabaseresource agent allows improved handling of SAP database instances in Red Hat cluster environment.