Updated corosync packages that fix several bugs and add two enhancements are now available for Red Hat Enterprise Linux 6.
The corosync packages provide the Corosync Cluster Engine and C Application Programming Interfaces (APIs) for Red Hat Enterprise Linux cluster software.
- When running corosync on a faulty network with the failed_to_recv configuration option set, corosync was very often terminated with a segmentation fault after a cluster node was marked as "failed to receive". This happened because an assert condition was met during a cluster node membership determination. To fix this problem, the underlying code has been modified to ignore the assert if it was triggered by nodes marked as "failed to receive". This is safe because a single node membership is always established in this situation.
- The corosync-notifyd service was not started right after installation because the default configuration of the corosync notifier did not exist. This fix adds the default configuration for this service in the /etc/sysconfig/corosync-notifyd file so that corosync-notifyd can now be started right after installation without any additional configuration.
- Due to a bug in the underlying code, the corosync API could read uninitialized memory, and thus return incorrect values when incrementing or decrementing value of certain objects in the configuration and statistics database. This update modifies the respective code to only read 16 bits of memory instead of 32 bits when returning the [u]int16 type values. The corosync API no longer read uninitialized memory and return correct values.
- Due to a rare race condition in the corosync logging system, corosync could terminate with a segmentation fault after an attempt to dereference a NULL pointer. A pthread mutex lock has been added to a respective formatting variable so that the race condition between log-formatting and log-printing functions is now avoided.
- Previously, corosync did not support IPv6 double colon notation and did not handle correctly closing braces when parsing the corosync.conf file. As a consequence, the totem service failed to start when using IPv6. If the configuration file contained additional closing braces, no error was displayed to inform users why was the configuration file not parsed successfully. This update fixes these parsing bugs so the totem service can now be successfully started, and an error message is displayed if the corosync.conf file contains additional closing braces.
- Due to multiple bugs in the corosync code, either duplicate or no messages were delivered to applications if the corosync service was terminated on multiple cluster nodes. This update applies a series of patches correcting these bugs so that corosync no longer loses or duplicates messages in this scenario.
- The corosync-fplay utility could terminate with a segmentation fault or result in unpredictable behavior if the corosync fdata file became corrupted. With this update, corosync-fplay has been modified to detect loops in code and properly validate fdata files. To avoid another cause of fdata corruption, corosync now also prohibits its child processes from logging. As a result of these changes, corosync no longer crashes or becomes unresponsive in this situation.
- If a service section in the corosync.conf file did not contain a service name, corosync either terminated with a segmentation fault or refused to start an unknown service. With this update, corosync now properly verifies the name key and if no service name is found, returns an error message and exits gracefully.
- The corosync service did not correctly handle a situation when it received an exit request (the SIGINT signal) before the service initialization was complete. As a consequence, corosync became unresponsive and ignored all signals, except for SIGKILL. This update adds a semaphore to ensure that corosync exits gracefully in this situation.
- When running applications that used the Corosync inter-process communication (IPC) library, some messages in the dispatch() function were lost or duplicated. With this update, corosync properly verifies return values of the dispatch_put() function, returns the correct remaining bytes in the IPC ring buffer, and ensures that the IPC client is correctly informed about the real number of messages in the ring buffer. Messages in the dispatch() function are no longer lost or duplicated.
- Sometimes, when an attempt to shut down the corosync service using the "corosync-cfgtool -H" command failed and returned the CS_ERR_TRY_AGAIN error code, subsequent shutdown attempts always failed with the CS_ERR_EXISTS error. The corosync-cfgtool utility has been modified to automatically retry the shutdown command, and the Corosync's Cfg library now allows processing of multiple subsequent shutdown calls. The "corosync-cfgtool -H" command now works as expected even on heavily loaded cluster nodes.
- If the uidgid section of the corosync.conf file contained a non-existing user or group, corosync did not display any error. The underlying code has been modified so that corosync now properly verifies values returned by the getpwnam_r system call, and displays an appropriate error message in this situation.
- If an IPC client exited in a specific time frame of the connection handshake, the corosync main process received the SIGPIPE signal and terminated. With this update, the SIGPIPE signal is now correctly handled by the sendto() function and the corosync main process no longer terminates in this situation.
- The corosync process could become unresponsive upon exit, by sending the SIGINT signal or using the corosync-cfgtool utility, if it had open a large number of confdb IPC connections. This update modifies the corosync code to ensure that all IPC connection to the configuration and statistics database are closed upon corosync exit so that corosync exits as expected.
- The corosync daemon now detects when the corosync main process was not scheduled for a long time and sends a relevant message to the system log.
- In order to improve process of problem detection, output of the corosync-blackbox command now contains time stamps of events. This feature is backward-compatible so that output (fdata) from old versions of corosync is processed correctly.
Users of corosync are advised to upgrade to these updated packages, which fix these bugs and add these enhancements.