11.11. Rebalancing Volumes

If a volume has been expanded or shrunk using the add-brick or remove-brick commands, the data on the volume needs to be rebalanced among the servers.

Note

In a non-replicated volume, all bricks should be online to perform the rebalance operation using the start option. In a replicated volume, at least one of the bricks in the replica should be online.
To rebalance a volume, use the following command on any of the servers:
# gluster volume rebalance VOLNAME start
For example:
# gluster volume rebalance test-volume start
Starting rebalancing on volume test-volume has been successful
When run without the force option, the rebalance command attempts to balance the space utilized across nodes. Files whose migration would cause the target node to have less available space than the source node are skipped. This results in linkto files being retained, which may cause slower access when a large number of linkto files are present.
Red Hat strongly recommends you to disconnect all the older clients before executing the rebalance command to avoid a potential data loss scenario.

Warning

The Rebalance command can be executed with the force option even when the older clients are connected to the cluster. However, this could lead to a data loss situation.
A rebalance operation with force, balances the data based on the layout, and hence optimizes or does away with the link files, but may lead to an imbalanced storage space used across bricks. This option is to be used only when there are a large number of link files in the system.
To rebalance a volume forcefully, use the following command on any of the servers:
# gluster volume rebalance VOLNAME start force
For example:
# gluster volume rebalance test-volume start force
Starting rebalancing on volume test-volume has been successful

11.11.1. Rebalance Throttling

The rebalance process uses multiple threads to ensure good performance during migration of multiple files. During multiple file migration, there can be a severe impact on storage system performance and a throttling mechanism is provided to manage it.
By default, the rebalance throttling is started in the normal mode. Configure the throttling modes to adjust the rate at which the files must be migrated
# gluster volume set VOLNAME rebal-throttle lazy|normal|aggressive
For example:
# gluster volume set test-volume rebal-throttle lazy

11.11.2. Displaying Rebalance Progress

To display the status of a volume rebalance operation, use the following command:
# gluster volume rebalance VOLNAME status
For example:
# gluster volume rebalance test-volume status
Node          Rebalanced size   scanned failures skipped status      run time
              -files					                                       in h:m:s
------------- ---------- ------ ------- -------- ------- ----------- --------
localhost   71962      70.3GB 380852  0        0       in progress 2:02:20
server1     70489      68.8GB 502185  0        0       in progress 2:02:20
server2     70704      69.0GB 507728  0        0       in progress 2:02:20
server3     71819      70.1GB 435611  0        0       in progress 2:02:20
Estimated time left for rebalance to complete :        2:50:24
A rebalance operation starts a rebalance process on each node of the volume. Each process is responsible for rebalancing the files on its own individual node. Each row of the rebalance status output describes the progress of the operation on a single node.

Important

If there is a reboot while rebalancing the rebalance output might display an incorrect status and some files might not get rebalanced.
Workaround: After the reboot, once the previous rebalance is completed, trigger another rebalance so that the files that were not balanced during the reboot are now rebalanced correctly, and the rebalance output gives the correct status.
The following table describes the output of the rebalance status command:

Table 11.2. Rebalance Status Output Description

Property Name Description
Node The name of the node.
Rebalanced-files The number of files that were successfully migrated.
size The total size of the files that were migrated.
scanned The number of files scanned on the node. This includes the files that were migrated.
failures The number of files that could not be migrated because of errors.
skipped The number of files which were skipped because of various errors or reasons.
status The status of the rebalance operation on the node is in progress, completed, or failed.
run time in h:m:s The amount of time for which the process has been running on the node.
The estimated time left for the rebalance to complete on all nodes is also displayed. The estimated time to complete is displayed only after the rebalance operation has been running for 10 minutes. In cases where the remaining time is extremely large, the estimated time to completion is displayed as >2 months and the user is advised to check again later.
The time taken to complete a rebalance operation depends on the number of files estimated to be on the bricks and the rate at which files are being processed by the rebalance process. This value is recalculated every time the rebalance status command is executed and becomes more accurate the longer rebalance has been running, and for large data sets. The calculation assumes that a file system partition contains a single brick.
A rebalance balance operation is considered complete when the status of every node is completed. For example:
  # gluster volume rebalance test-volume status
  Node       Rebalanced  size   scanned failures skipped status     run time
              -files					                                      in h:m:s
  ---------- ---------- ----- ------- -------- ------- -----------  --------
  node2         0       0Bytes       0       0       0   completed   0:02:23
  node3       234      737.8KB    3350       0     257   completed   0:02:25
  node4         3        14.6K      71       0       6   completed   0:00:02
localhost   317	       1.1MB	  3484       0     155   completed   0:02:38
  volume rebalance: test-volume: success
With this release, details about the files that are skipped during rebalance operation can be obtained. Entries of all such files are available in the rebalance log with the message ID 109126. You can search for the message ID from the log file and get the list of all the skipped files:
For example:
# grep -i 109126 /var/log/glusterfs/test-volume-rebalance.log
[2018-03-15 09:14:30.203393] I [MSGID: 109126] [dht-rebalance.c:2715:gf_defrag_migrate_single_file] 0-test-volume-dht: File migration skipped for /linux-4.9.27/Documentation/ABI/stable/sysfs-fs-orangefs.
[2018-03-15 09:14:31.262969] I [MSGID: 109126] [dht-rebalance.c:2715:gf_defrag_migrate_single_file] 0-test-volume-dht: File migration skipped for /linux-4.9.27/Documentation/ABI/stable/sysfs-devices.
[2018-03-15 09:14:31.842631] I [MSGID: 109126] [dht-rebalance.c:2715:gf_defrag_migrate_single_file] 0-test-volume-dht: File migration skipped for /linux-4.9.27/Documentation/ABI/stable/sysfs-devices-system-cpu.
[2018-03-15 09:14:33.733728] I [MSGID: 109126] [dht-rebalance.c:2715:gf_defrag_migrate_single_file] 0-test-volume-dht: File migration skipped for /linux-4.9.27/Documentation/ABI/testing/sysfs-bus-fcoe.
[2018-03-15 09:14:35.576404] I [MSGID: 109126] [dht-rebalance.c:2715:gf_defrag_migrate_single_file] 0-test-volume-dht: File migration skipped for /linux-4.9.27/Documentation/ABI/testing/sysfs-bus-iio-frequency-ad9523.
[2018-03-15 09:14:43.378480] I [MSGID: 109126] [dht-rebalance.c:2715:gf_defrag_migrate_single_file] 0-test-volume-dht: File migration skipped for /linux-4.9.27/Documentation/DocBook/kgdb.tmpl.
To know more about the failed files, search for 'migrate-data failed' in the rebalance.log file. However, the count for rebalance failed files will not match with "migrate-data failed" in the rebalance.log because the failed count includes all possible failures and just not file migration.

11.11.3. Stopping a Rebalance Operation

To stop a rebalance operation, use the following command:
# gluster volume rebalance VOLNAME stop
For example:
# gluster volume rebalance test-volume stop
Node          Rebalanced size    scanned failures skipped status      run time
              -files					                                        in h:m:s
------------- ---------- ------- ------- -------- ------- ----------- --------
localhost     106504     104.0GB 558111  0        0       stopped     3:02:24
server1       102299      99.9GB 725239  0        0       stopped     3:02:24
server2       102264      99.9GB 737364  0        0       stopped     3:02:24
server3       106813     104.3GB 646581  0        0       stopped     3:02:24
Estimated time left for rebalance to complete :        2:06:38