|Red Hat Cluster Manager: The Red Hat Cluster Manager Installation and Administration Guide|
|Prev||Chapter 3. Cluster Software Installation and Configuration||Next|
To ensure that the cluster software has been correctly configured, use the following tools located in the /sbin directory:
Test the quorum partitions and ensure that they are accessible.
Invoke the cludiskutil utility with the -t option to test the accessibility of the quorum partitions. See the Section called Testing the Quorum Partitions for more information.
Test the operation of the power switches.
If power switches are used in the cluster hardware configuration, run the clustonith command on each cluster system to ensure that it can remotely power-cycle the other cluster system. Do not run this command while the cluster software is running. See the Section called Testing the Power Switches for more information.
Ensure that both cluster systems are running the same software version.
Invoke the rpm -q clumanager command on each cluster system to display the revision of the installed cluster RPM.
The following section explains the cluster utilities in further detail.
The quorum partitions must refer to the same physical device on both cluster systems. Invoke the cludiskutil utility with the -t command to test the quorum partitions and verify that they are accessible.
If the command succeeds, run the cludiskutil -p command on both cluster systems to display a summary of the header data structure for the quorum partitions. If the output is different on the systems, the quorum partitions do not point to the same devices on both systems. Check to make sure that the raw devices exist and are correctly specified in the /etc/sysconfig/rawdevices file. See the Section called Configuring Quorum Partitions in Chapter 2 for more information.
The following example shows that the quorum partitions refer to the same physical device on two cluster systems (devel0 and devel1):
/sbin/cludiskutil -p ----- Shared State Header ------ Magic# = 0x39119fcd Version = 1 Updated on Thu Sep 14 05:43:18 2000 Updated by node 0 -------------------------------- /sbin/cludiskutil -p ----- Shared State Header ------ Magic# = 0x39119fcd Version = 1 Updated on Thu Sep 14 05:43:18 2000 Updated by node 0 --------------------------------
The Magic# and Version fields will be the same for all cluster configurations. The last two lines of output indicate the date that the quorum partitions were initialized with cludiskutil -I, and the numeric identifier for the cluster system that invoked the initialization command.
If the output of the cludiskutil utility with the -p option is not the same on both cluster systems, perform the following:
Examine the /etc/sysconfig/rawdevices file on each cluster system and ensure that the raw character devices and block devices for the primary and backup quorum partitions have been accurately specified. If they are not the same, edit the file and correct any mistakes. Then re-run the cluconfig utility. See the Section called Editing the rawdevices File for more information.
Ensure that you have created the raw devices for the quorum partitions on each cluster system. See the Section called Configuring Quorum Partitions in Chapter 2 for more information.
On each cluster system, examine the system startup messages at the point where the system probes the SCSI subsystem to determine the bus configuration. Verify that both cluster systems identify the same shared storage devices and assign them the same name.
Verify that a cluster system is not attempting to mount a file system on the quorum partition. For example, make sure that the actual device (for example, /dev/sdb1) is not included in an /etc/fstab file.
After performing these tasks, re-run the cludiskutil utility with the -p option.
If either network- or serial-attached power switches are employed in the cluster hardware configuration, install the cluster software and invoke the clustonith command to test the power switches. Invoke the command on each cluster system to ensure that it can remotely power-cycle the other cluster system. If testing is successful, then the cluster can be started. If using watchdog timers or the switch type "None", then this test can be omitted.
The clustonith command can accurately test a power switch only if the cluster software is not running. This is due to the fact that for serial attached switches, only one program at a time can access the serial port that connects a power switch to a cluster system. When the clustonith command is invoked, it checks the status of the cluster software. If the cluster software is running, the command exits with a message to stop the cluster software.
The format of the clustonith command is as follows:
clustonith [-sSlLvr] [-t devicetype] [-F options-file] \ [-p stonith-parameters] Options: -s Silent mode, supresses error and log messages -S Display switch status -l List the hosts a switch can access -L List the set of supported switch types -r hostname Power cycle the specified host -v Increases verbose debugging level
When testing power switches, the first step is to ensure that each cluster member can successfully communicate with its attached power switch. The following example of the clustonith command output shows that the cluster member is able to communicate with its power switch:
clustonith -S WTI Network Power Switch device OK. An example output of the clustonith command when it is unable to communicate with its power switch appears below: clustonith -S Unable to determine power switch type. Unable to determine default power switch type.
The above error could be indicative of the following types of problems:
For serial attached power switches:
Verify that the device special file for the remote power switch connection serial port (for example, /dev/ttyS0) is specified correctly in the cluster database, as established via the cluconfig command. If necessary, use a terminal emulation package such as minicom to test if the cluster system can access the serial port.
Ensure that a non-cluster program (for example, a getty program) is not using the serial port for the remote power switch connection. You can use the lsof command to perform this task.
Check that the cable connection to the remote power switch is correct. Verify that the correct type of cable is used (for example, an RPS-10 power switch requires a null modem cable), and that all connections are securely fastened.
Verify that any physical dip switches or rotary switches on the power switch are set properly. If using an RPS-10 power switch, see the Section called Setting up RPS-10 Power Switches in Appendix A for more information.
For network based power switches:
Verify that the network connection to network-based switches is operational. Most switches have a link light that indicates connectivity.
It should be possible to ping the network switch; if not, then the switch may not be properly configured for its network parameters.
Verify that the correct password and login name (depending on switch type) have been specified in the cluster configuration database (as established by running cluconfig). A useful diagnostic approach is to verify telnet access to the network switch using the same parameters as specified in the cluster configuration.
After successfully verifying communication with the switch, attempt to power cycle the other cluster member. Prior to doing this, it would is recommended to verify that the other cluster member is not actively performing any important functions (such as serving cluster services to active clients). The following command depicts a successful power cycle operation:
clustonith -r clu3 Successfully power cycled host clu3.