Red Hat Training

A Red Hat training course is available for Red Hat OpenStack Platform

Chapter 6. Using Galera

In a high-availability deployment, Red Hat OpenStack Platform uses the MariaDB Galera Cluster to manage database replication. As mentioned in Section 4.3, “OpenStack Services Configured in Pacemaker”, Pacemaker runs the Galera service using a Master/Slave Set resource. You can use pcs status to check if galera-master is running, and on which controllers:

Master/Slave Set: galera-master [galera]
     Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
Hostname resolution
When troubleshooting the MariaDB Galera Cluster, start by verifying hostname resolution. By default, the director binds the Galera resource to a hostname rather than an IP address [1]. As such, any problems preventing hostname resolution (for example, a misconfigured or failed DNS) could, in turn, prevent Pacemaker from properly managing the Galera resource.

Once you rule out hostname resolution problems, check the integrity of the cluster itself. To do so, check the status of write-set replication on each Controller node’s database.

Write-set replication information is stored on each node’s MariaDB database. Each relevant variable uses the prefix wsrep_. As such, you can query this information directly through the database client:

$ sudo mysql -B -e "SHOW GLOBAL STATUS LIKE 'wsrep_%';"
    +------------------------+-------+
    | Variable_name          | Value |
    +------------------------+-------+
    | wsrep_protocol_version | 5     |
    | wsrep_last_committed   | 202   |
    | ...                    | ...   |
    | wsrep_thread_count     | 2     |
    +------------------------+-------+

To verify the health and integrity of the MariaDB Galera Cluster, check first whether the cluster is reporting the right number of nodes. Then, check each node if it:

  • Is part of the correct cluster
  • Can write to the cluster
  • Can receive queries and writes from the cluster
  • Is connected to others within the cluster
  • Is replicating write-sets to tables in the local database

The following sections discuss how to investigate each status.

6.1. Investigating Database Cluster Integrity

When investigating problems with the MariaDB Galera Cluster, start with the integrity of the cluster itself. Verifying cluster integrity involves checking specific wsrep_ database variables on each Controller node. To check a database variable, run:

$ sudo mysql -B -e "SHOW GLOBAL STATUS LIKE 'VARIABLE';"

Replace VARIABLE with the wsrep_ database variable you want to check. For example, to view the node’s cluster state UUID:

$ sudo mysql -B -e "SHOW GLOBAL STATUS LIKE 'wsrep_cluster_state_uuid';"
    +--------------------------+--------------------------------------+
    | Variable_name            | Value                                |
    +--------------------------+--------------------------------------+
    | wsrep_cluster_state_uuid | e2c9a15e-5485-11e0-0800-6bbb637e7211 |
    +--------------------------+--------------------------------------+

The following table lists the different wsrep_ database variables that relate to cluster integrity.

Table 6.1. Database variables to check for cluster integrity

VARIABLESummaryDescription

wsrep_cluster_state_uuid

Cluster state UUID

The ID of the cluster to which the node belongs. All nodes must have an identical ID. A node with a different ID is not connected to the cluster.

wsrep_cluster_size

Number of nodes in the cluster

You can check this on any single node. If the value is less than the actual number of nodes, then some nodes have either failed or lost connectivity.

wsrep_cluster_conf_id

Total number of cluster changes

Determines whether or not the cluster has been split into several components, also known as a partition. This is likely caused by a network failure. All nodes must have an identical value.

In case some nodes are reporting a different wsrep_cluster_conf_id, check their wsrep_cluster_status value to see if it can still write to the cluster (Primary).

wsrep_cluster_status

Primary component status

Determines whether or not the node can still write to the cluster. If so, then the wsrep_cluster_status should be Primary. Any other value indicates that the node is part of a non-operational partition.

6.2. Investigating Database Cluster Node

If you can isolate a Galera cluster problem to a specific node, other wsrep_ database variables can provide clues on the specific problem. You can check these variables in a similar manner as a cluster check (as in Section 6.1, “Investigating Database Cluster Integrity”):

$ sudo mysql -B -e "SHOW GLOBAL STATUS LIKE 'VARIABLE';"

Likewise, replace VARIABLE with any of the following values:

Table 6.2. Database variables to check for node integrity

VARIABLESummaryDescription

wsrep_ready

Node ability to accept queries

States whether the node can accept write-sets from the cluster. If so, then wsrep_ready should be ON.

wsrep_connected

Node network connectiviry

States whether the node has network connectivity to other nodes. If so, then wsrep_connected should be ON.

wsrep_local_state_comment

Node state

Summarizes the node state. If node can still write to the cluster (ie. if wsrep_cluster_status is Primary, see Section 6.1, “Investigating Database Cluster Integrity”), then typical values for wsrep_local_state_comment are Joining, Waiting on SST, Joined, Synced, or Donor.

If the node is part of a non-operational component, then wsrep_local_state_comment is set to Initialized.

Note

A wsrep_connected of ON could also mean that the node is only connected to some nodes. For example, in cases of a cluster partition, the node may be part of a component that cannot write to the cluster. See Section 6.1, “Investigating Database Cluster Integrity” for details.

If wsrep_connected is OFF, then the node is not connected to ANY cluster components.

6.3. Investigating Database Replication Performance

If cluster and its individual nodes are all healthy and stable, you can also check replication throughput to benchmark performance. As in Section 6.2, “Investigating Database Cluster Node” and Section 6.1, “Investigating Database Cluster Integrity”, doing so involves wsrep_ database variables on each node:

$ sudo mysql -B -e "SHOW STATUS LIKE 'VARIABLE';"

Likewise, replace VARIABLE with any of the following values:

Table 6.3. Database variables to check for cluster performance (replication throughput)

VARIABLESummary

wsrep_local_recv_queue_avg

Average size of the local received queue since last query

wsrep_local_send_queue_avg

Average send queue length since the last time the variable was queried

wsrep_local_recv_queue_min and wsrep_local_recv_queue_max

The minimum and maximum sizes the local received queue since either variable was last queried

wsrep_flow_control_paused

Fraction of time that the node paused due to Flow Control since the last time the variable was queried

wsrep_cert_deps_distance

Average distance between the lowest and highest sequence number (seqno) value that can be applied in parallel (ie. potential degree of parallelization)

Each time any of these variables are queried, a FLUSH STATUS command resets its value. Benchmarking cluster replication involves querying these values multiple times to see variances. These variances can help you judge how much Flow Control is affecting the cluster’s performance.

Flow Control is a mechanism used by the cluster to manage replication. When the local received write-set queue exceeds a certain threshold, Flow Control pauses replication in order for the node to catch up. See Flow Control from the Galera Cluster site for more information.

Check the following table for clues on different values and benchmarks:

wsrep_local_recv_queue_avg > 0.0
The node cannot apply write-sets as quickly as it receives them, thereby triggering replication throttling. Check wsrep_local_recv_queue_min and wsrep_local_recv_queue_max for a detailed look at this benchmark.
wsrep_local_send_queue_avg > 0.0
As the value of wsrep_local_send_queue_avg rises, so does the likelihood of replication throttling and network throughput issues. This is especially true as wsrep_local_recv_queue_avg rises.
wsrep_flow_control_paused > 0.0

Flow Control paused the node. To determine how long the node was paused, multiply the wsrep_flow_control_paused value with the number of seconds between querying it. For example, if wsrep_flow_control_paused = 0.50 a minute after last checking it, then node replication was paused for 30 seconds. If wsrep_flow_control_paused = 1.0 then the node was paused the entire time since the last query.

Ideally, wsrep_flow_control_paused should be as close to 0.0 as possible.

In case of throttling and pausing, you can check wsrep_cert_deps_distance to see how many write-sets (on average) can be applied in parallel. Then, check wsrep_slave_threads to see how many write-sets can actually be applied simultaneously.

Configuring a higher wsrep_slave_threads can help mitigate throttling and pausing. For example, wsrep_cert_deps_distance reads 20, then doubling wsrep_slave_threads from 2 to 4 can also double the amount of write-sets that the node can apply. However, wsrep_slave_threads should only be set as high as the node’s number of CPU cores.

If a problematic node already has an optimal wsrep_slave_threads setting, then consider excluding the node from the cluster as you investigate possible connectivity issues.



[1] This method was implemented to allow Galera to start successfully in overclouds that use IPv6 (specifically, to address BZ#1298671).