11.9. Rebalancing Volumes

If a volume has been expanded or shrunk using the add-brick or remove-brick commands, the data on the volume needs to be rebalanced among the servers.

Note

In a non-replicated volume, all bricks should be online to perform the rebalance operation using the start option. In a replicated volume, at least one of the bricks in the replica should be online.
To rebalance a volume, use the following command on any of the servers:
# gluster volume rebalance VOLNAME start
For example:
# gluster volume rebalance test-volume start
Starting rebalancing on volume test-volume has been successful
When run without the force option, the rebalance command attempts to balance the space utilized across nodes. Files whose migration would cause the target node to have less available space than the source node are skipped. This results in linkto files being retained, which may cause slower access when a large number of linkto files are present.
Enhancements made to the file rename and rebalance operations in Red Hat Gluster Storage 2.1 update 5 requires that all the clients connected to a cluster operate with the same or later versions. If the clients operate on older versions, and a rebalance operation is performed, the following warning message is displayed and the rebalance operation will not be executed.
volume rebalance: VOLNAME: failed: Volume VOLNAME has one or more connected clients of a version lower than Red Hat Gluster Storage-2.1 update 5. Starting rebalance in this state could lead to data loss.
Please disconnect those clients before attempting this command again.
Red Hat strongly recommends you to disconnect all the older clients before executing the rebalance command to avoid a potential data loss scenario.

Warning

The Rebalance command can be executed with the force option even when the older clients are connected to the cluster. However, this could lead to a data loss situation.
A rebalance operation with force, balances the data based on the layout, and hence optimizes or does away with the link files, but may lead to an imbalanced storage space used across bricks. This option is to be used only when there are a large number of link files in the system.
To rebalance a volume forcefully, use the following command on any of the servers:
# gluster volume rebalance VOLNAME start force
For example:
# gluster volume rebalance test-volume start force
Starting rebalancing on volume test-volume has been successful

11.9.1. Rebalance Throttling

The rebalance process uses multiple threads to ensure good performance during migration of multiple files. During multiple file migration, there can be a severe impact on storage system performance and a throttling mechanism is provided to manage it.
By default, the rebalance throttling is started in the normal mode. Configure the throttling modes to adjust the rate at which the files must be migrated
# gluster volume set VOLNAME rebal-throttle lazy|normal|aggressive
For example:
# gluster volume set test-volume rebal-throttle lazy

11.9.2. Displaying Rebalance Progress

To display the status of a volume rebalance operation, use the following command:
# gluster volume rebalance VOLNAME status
For example:
# gluster volume rebalance test-volume status
# gluster volume rebalance test-volume status
Node          Rebalanced size   scanned failures skipped status      run time
              -files					                                       in h:m:s
------------- ---------- ------ ------- -------- ------- ----------- --------
10.70.37.01   71962      70.3GB 380852  0        0       in progress 2:02:20
10.70.37.02   70489      68.8GB 502185  0        0       in progress 2:02:20
10.70.37.03   70704      69.0GB 507728  0        0       in progress 2:02:20
10.70.37.04   71819      70.1GB 435611  0        0       in progress 2:02:20
Estimated time left for rebalance to complete :        2:50:24
This displays the estimated time left for the rebalance to complete on all nodes. The estimated time to complete is displayed only after the rebalance operation has been running for 10 minutes. In cases where the remaining time is extremely large, the estimated time to completion is displayed as >2 months and the user is advised to check again later.
The time taken to complete a rebalance operation depends on the number of files estimated to be on the bricks and the rate at which files are being processed by the rebalance process. This value is recalculated every time the rebalance status command is executed and becomes more accurate the longer rebalance has been running, and for large data sets. The calculation assumes that a file system partition contains a single brick.
The rebalance status is shown as completed when the rebalance is complete. For example:
# gluster volume rebalance test-volume status
Node          Rebalanced size    scanned failures skipped status      run time
              -files					                                        in h:m:s
------------- ---------- ------- ------- -------- ------- ----------- --------
10.70.37.01   118715     115.9GB 768835  0        30988   completed   3:52:44
10.70.37.02   148113     144.6GB 1242793 0        44258   completed   4:36:27
10.70.37.03   148226     144.8GB 1261041 0        44212   completed   4:36:27
10.70.37.04   119558     116.8GB 848517  0        28239   completed   3:49:35
volume rebalance: test-volume: success

11.9.3. Stopping a Rebalance Operation

To stop a rebalance operation, use the following command:
# gluster volume rebalance VOLNAME stop
For example:
# gluster volume rebalance test-volume stop
Node          Rebalanced size    scanned failures skipped status      run time
              -files					                                        in h:m:s
------------- ---------- ------- ------- -------- ------- ----------- --------
10.70.37.01   106504     104.0GB 558111  0        0       stopped     3:02:24
10.70.37.02   102299      99.9GB 725239  0        0       stopped     3:02:24
10.70.37.03   102264      99.9GB 737364  0        0       stopped     3:02:24
10.70.37.04   106813     104.3GB 646581  0        0       stopped     3:02:24
Estimated time left for rebalance to complete :        2:06:38