Red Hat Training

A Red Hat training course is available for Red Hat Enterprise Linux

Chapter 5. RAC/GFS Cluster Configuration

This chapter provides information on a configuring RAC/GFS cluster. For information on configuring a cold failover HA cluster, see Chapter 6, Cold Failover Cluster Configuration.
Preparing a cluster for RAC requires additional package installation and configuration. Deploying Oracle RAC on a certified GFS cluster requires additional software and configuration. The aim of this section is to demonstrate these scenarios.
Oracle RAC is a shared-disk option of Enterprise Edition that requires another Oracle product (Clusterware) to be installed as well. This complicates the Red Hat Cluster Suite install, as there are now 2 independent clustering layers running simultaneously on the cluster. Oracle requires that Clusterware (CRS) be installed on top of Red Hat Cluster Suite, and this will be the chapter’s focus. The chapter assumes that the user can install CRS (as well as the RDBMS).
All Oracle database files can reside on GFS clustered volumes, except Oracle Clusterware product files (ORA_CRS_HOME). The Oracle RDBMS product files (ORACLE_HOME) can be installed on shared GFS volumes, although Context Dependent Pathnames (CDPN) will be required for some ORACLE_HOME directories.

5.1. Oracle Clusterware

Oracle Clusterware is a stand-alone cluster layer that Oracle provides for use with the RAC option. CRS mimics all the functionality of Red Hat Cluster Suite, but must be tuned so as to not interfere with Red Hat Cluster Suite’s ability to manage the cluster (and the GFS clustered file systems).
CRS requires a set of dedicated LUNs (that were allocated and configured for use with Multipath). Starting with 11gR1, the helper LUNS no longer need to be raw devices, but can be standard block devices. The inodes in the /dev/mapper file can now be used directly for the CRS Cluster Registry (OCR) and quorum (VOTE) files.
Oracle CRS installation permits external redundancy and internal redundancy. The external option assumes the storage array is responsible for their protection. In this installation option, only one copy of OCR and one copy of VOTE are allocated. In the internal redundancy configuration, Oracle creates two OCR files, organized as a simple RAID1 mirror, and generates three quorum VOTE files. The number of VOTE files can be higher, providing it is a prime number of files. Most installations choose three VOTE files, and most installations choose internal redundancy. CRS is certified for use in both internal and external redundancy.
Oracle CSS network services must be configured, and then set with sufficiently high timeouts to insure that only Red Hat Cluster Suite is responsible for heartbeat and fencing. These values must be set, or the configuration will not be supported.
CSS Timeout should be set to at least 300 seconds to 500 seconds. CSS Disk Timeout should be set to 500 seconds.

Note

Oracle cluster nodes are usually set to reboot and automatically re-enter the cluster. If the nodes should remain fenced, then the option="off" value in the fence section of the cluster.conf file can be set to ensure nodes are manually restarted. (The option value can be set to "reboot", "on", or "off"; by default, the value is "reboot".)

Note

The time a node takes to reboot depends on several factors, including BIOS settings. Many servers scan all of memory and then scan PCI buses for boot candidates from NICs or HBAs (of which there should only be one). Disabling these scans and any other steps in the BIOS that take time, will improve recovery performance. The grub.conf file often continues a built-in 5-second delay for screen hold. Sometimes, every second counts.

5.1.1. Cluster Recovery Time

In RAC/GFS, the road to transaction resumption starts with GFS filesystem recovery, and this is nearly instantaneous once fencing is complete. Oracle RAC must wait for CRS to recover the state of the cluster, and then the RDBMS can start to recover the locks for the failed instance (LMS recovery). Once complete, the redo logs from the failed instance must be processed. One of the surviving nodes must acquire the redo logs of the failed node, and determine which objects need recovery. Oracle activity is partially resumed as soon as RECO (DB recovery process) determines the list of embargoed objects that need recovery. Once roll-forward is complete, all non-embargoed and recovered objects are available. Oracle (and especially RAC) recovery is a complex subject, but its performance tuning can result in reduced downtime. And that could mean $Ms in recovered revenue.

Note

It is possible to push the CSS Timeout below 300 seconds, if the nodes can boot in 60 seconds or less.