Unable to create cluster using luci. Found logs are filling with '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/XXXX' messages.

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux Server 5 (with the High Availability Add on)
  • Red Hat Enterprise Linux Server 6 (with the High Availability Add on)
  • ricci
  • luci

Issue

  • After clicking "submit" to create a cluster in luci, nothing seems to happen in the web-ui. The ricci clients seem to be hung in the install phase. The following command is printed over and over in the log file:
rh5node1 ricci[2146]: Executing '/usr/bin/yum -y list all'
rh5node1 ricci[2150]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/XXXX

Resolution

This issue has been resolved for RHEL 6 in the following errata: RHBA-2012-0898
This issue has been resolved for RHEL 6.z in the following errata: RHBA-2011:1510
This issue has been resolved for RHEL 5 in the following errata: RHSA-2012-0151
This issue has been resolved for RHEL 5.z in the following errata:RHBA-2011:1421

Workaround

When upgrading Luci is not possible due to local policies, a *workaround * can be used. to install or update all the packages for the cluster/shared storage channels so that the nodes have all the required packages installed or updated..

Install the required packages for all the cluster nodes that will be in the new cluster before using luci to "Create a New Cluster". If the option "Enable Shared Storage Support" is not going to be enabled/checked when creating a new cluster is created then that yum group for shared storage can be omitted. In RHEL 5 the yum group is called: "Cluster Storage" and in RHEL 6 the yum group is called: "Resilient Storage".

Install all the "Clustering" packages and the "Cluster Storage" packages for RHEL 5 so that ricci will not attempt to install the packages:

$ yum -y groupinstall "Clustering" "Cluster Storage"

Install all the "High Availability", "High Availability Management" , and "Resilient Storage" packages for RHEL 6 so that ricci will not attempt to install the packages. The package luci can be removed after install if that package is not needed:

$ yum -y groupinstall "High Availability" "High Availability Management" "Resilient Storage" 

Then create the new cluster with luci after entering all the required information after clicking "Create a New Cluster" button. If creating the cluster appears to hang on the "install" phase of the cluster install then kill the yum processes(view the /var/log/messages to see if it is hung). It is possible that there will be 3 processes that will need to be killed. Once the first process is killed then another yum process is created, and after second yum process is killed then a 3rd yum process is started which will need to be killed. Here is example of the process of killing the 3 yum processes:

$ ps aux | grep yum | grep -v grep
root 19343  4.4 10.3 229908 52944 ? S 11:27 0:06 /usr/bin/python /usr/bin/yum -y list all
$ kill 19343
$ ps aux | grep yum | grep -v grep
root 19389 43.5 10.3 229904 52940 ? S 11:30 0:06 /usr/bin/python /usr/bin/yum -y list all
$ kill 19389
$ ps aux | grep yum | grep -v grep
root 19401 93.7 10.3 229488 52512 ? R 11:31 0:03 /usr/bin/python /usr/bin/yum -y list all
$ logger "Killing YUM"; kill 19401

The install should finish after that and start the cluster services if all the packages required are already installed. If this procedure fails to create the new cluster, then try using the same procedure without enabling/unchecking the "Enable Shared Storage Support" option.

Root Cause

Prior to this update, when a new cluster was being created with luci, and luci tried to list, install or update cluster packages, the installation process could become unresponsive and could not finish. With this update, the bug has been fixed, and the creation of a new cluster now completes successfully in the described scenario.+Prior to this update, when a new cluster was being created with luci whereas ricci agents tried to list, install or update cluster packages, the installation process could become unresponsive and could not finish.

Diagnostic Steps

Check the log file /var/log/messages for messages from ricci that look like the following. Here is an example of the logs in RHEL 5:

Aug 22 10:17:05 rh5node1 ricci[2132]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1642016747'
Aug 22 10:17:06 rh5node1 ricci[2137]: Executing '/bin/rpm -qa'
Aug 22 10:17:06 rh5node1 ricci[2139]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/45730346'
Aug 22 10:17:09 rh5node1 ricci[2146]: Executing '/usr/bin/yum -y list all'
Aug 22 10:17:13 rh5node1 ricci[2150]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1246022549'
Aug 22 10:17:19 rh5node1 ricci[2156]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/726253194'
Aug 22 10:17:26 rh5node1 ricci[2168]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/261825485'
....
Aug 22 10:26:55 rh5node1 ricci[3102]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/54898223

This message will appear, but the ricci client will not proceed pass the install phase:

Aug 22 10:17:05 rh5node3 ricci[15253]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/173980106'
Aug 22 10:17:05 rh5node3 ricci[15256]: Executing '/bin/rpm -qa'
Aug 22 10:17:08 rh5node3 ricci[15258]: Executing '/usr/bin/yum -y list all'

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments