7.4. Testing database replication performance

If the cluster and the individual nodes are all healthy and stable, you can run performance benchmark tests on the replication throughput by querying specific database variables.

Every time you query one of these variables, a FLUSH STATUS command resets the variable value. To run benchmark tests, you must run multiple queries and analyze the variances. These variances can help you determine how much Flow Control is affecting the cluster’s performance.

Flow Control is a mechanism that the cluster uses to manage replication. When the local received queue exceeds a certain threshold, Flow Control pauses the replication until the queue size goes down. For more information about Flow Control, see Flow Control on the Galera Cluster website.

Procedure

Run the following command and replace VARIABLE with the wsrep database variable that you want to check:

$ sudo docker exec galera-bundle-docker-0 sudo mysql -B --password="[MYSQL-HIERA-PASSWORD]" -e "SHOW STATUS LIKE 'VARIABLE';"

The following table lists the wsrep database variables that you can use to test database replication performance.

表7.3 Database variables to check for database replication performance

VariableSummaryUsage

wsrep_local_recv_queue_avg

Average size of the local received write-set queue after the last query.

A value higher than 0.0 indicates that the node cannot apply write-sets as quickly as it receives write-sets, which triggers replication throttling. Check wsrep_local_recv_queue_min and wsrep_local_recv_queue_max for a detailed look at this benchmark.

wsrep_local_send_queue_avg

Average send queue length after the last query.

A value higher than 0.0 indicates a higher likelihood of replication throttling and network throughput problems.

wsrep_local_recv_queue_min and wsrep_local_recv_queue_max

Minimum and maximum size of the local receive queue after the last query.

If the value of wsrep_local_recv_queue_avg is higher than 0.0, you can check these variables to determine the scope of the queue size.

wsrep_flow_control_paused

Fraction of the time that Flow Control paused the node after the last query.

A value higher than 0.0 indicates that Flow Control paused the node. To determine the duration of the pause, multiply the wsrep_flow_control_paused value with the number of seconds between the queries. The optimal value is as close to 0.0 as possible.

For example:

  • If the value of wsrep_flow_control_paused is 0.50 one minute after the last query, then Flow Control paused the node for 30 seconds.
  • If the value of wsrep_flow_control_paused is 1.0 one minute after the last query, then Flow Control paused the node for the entire minute.

wsrep_cert_deps_distance

Average difference between the lowest and highest sequence number (seqno) value that can be applied in parallel

In case of throttling and pausing, this variable indicates how many write-sets on average can be applied in parallel. Compare the value with the wsrep_slave_threads variable to see how many write-sets can actually be applied simultaneously.

wsrep_slave_threads

Number of threads that can be applied simultaneously

You can increase the value of this variable to apply more threads simultaneously, which also increases the value of wsrep_cert_deps_distance. The value of wsrep_slave_threads must not be higher than the number of CPU cores in the node.

For example, if the wsrep_cert_deps_distance value is 20, you can increase the value of wsrep_slave_threads from 2 to 4 to increase the amount of write-sets that the node can apply.

If a problematic node already has an optimal wsrep_slave_threads value, you can exclude the node from the cluster while you investigate possible connectivity issues.