Possible reasons for failures during a "rebalance" operation on a Gluster volume
Environment
Red Hat Storage Server 2.0 Update 4
Issue
What are some of the possible reasons for failures during a "rebalance" operation on a Gluster volume?
Resolution
Here are a few of the reasons, as to why failures are encountered during a rebalance operation:
- Skipped files are marked as failures. This is fixed in the upcoming 2.1 release of Red Hat Storage.
- A disruption in the network.
- A real failure!
If there is a cause for concern, please open a case with Red Hat Support to verify validity of "rebalance" failures.
Diagnostic Steps
- Skipped files being marked as failures:
root@node304 ~]# gluster volume rebalance test_volume status
Node Rebalanced-files size scanned failures status
--------- ----------- ----------- ----------- ----------- ------------
localhost 0 0 2410 0 in progress
192.168.0.89 432 1313462286 1970 0 in progress
node306.example.com 0 0 2411 0 in progress
node305.example.com 220 1401085348 1810 213 in progress
node301.example.com 351 1457860671 2039 101 in progress
node310.example.com 0 0 2410 0 in progress
node309.example.com 0 0 2407 0 in progress
node308.example.com 0 0 2407 0 in progress
node307.example.com 0 0 2409 0 in progress
node302.example.com 0 0 2410 0 in progress
Messages from "node301":
# grep -B1 "failed for" var/log/glusterfs/test_volume-rebalance.log
[20XX-XX-XX 02:31:02.824982] W [dht-rebalance.c:367:__dht_check_free_space] 0-test_volume-dht: data movement attempted from node (test_volume-replicate-0) with higher disk space to a node (test_volume-replicate-1) with lesser disk space (/archive/20130204/example.file1.gz)
[20XX-XX-XX 02:31:02.825059] E [dht-rebalance.c:1215:gf_defrag_migrate_data] 0-test_volume-dht: migrate-data failed for /archive/20130204/example.file1.gz
--
[20XX-XX-XX 02:38:37.777530] W [dht-rebalance.c:367:__dht_check_free_space] 0-test_volume-dht: data movement attempted from node (test_volume-replicate-0) with higher disk space to a node (test_volume-replicate-1) with lesser disk space (/archive/20130204/example.file2.gz)
[20XX-XX-XX 02:38:37.777577] E [dht-rebalance.c:1215:gf_defrag_migrate_data] 0-test_volume-dht: migrate-data failed for /archive/20130204/example.file2.gz
--
[20XX-XX-XX 02:38:37.802157] W [dht-rebalance.c:367:__dht_check_free_space] 0-test_volume-dht: data movement attempted from node (test_volume-replicate-0) with higher disk space to a node (test_volume-replicate-1) with lesser disk space (/archive/20130204/example.file3.gz)
[20XX-XX-XX 02:38:37.802204] E [dht-rebalance.c:1215:gf_defrag_migrate_data] 0-test_volume-dht: migrate-data failed for /archive/20130204/example.file3.gz
Messages from "node305":
# grep -B1 "failed for" var/log/glusterfs/test_volume-rebalance.log
[20XX-XX-XX 02:53:48.483732] W [dht-rebalance.c:367:__dht_check_free_space] 0-test_volume-dht: data movement attempted from node (test_volume-replicate-2) with higher disk space to a node (qvt-replicate-0) with lesser disk space (/archive/20130204/example.file4.gz)
[20XX-XX-XX 02:53:48.483780] E [dht-rebalance.c:1215:gf_defrag_migrate_data] 0-test_volume-dht: migrate-data failed for /archive/20130204/example.file4.gz
--
[20XX-XX-XX 02:53:48.523134] W [dht-rebalance.c:367:__dht_check_free_space] 0-test_volume-dht: data movement attempted from node (test_volume-replicate-2) with higher disk space to a node (qvt-replicate-0) with lesser disk space (/archive/20130204/example.file5.gz)
[20XX-XX-XX 02:53:48.523197] E [dht-rebalance.c:1215:gf_defrag_migrate_data] 0-test_volume-dht: migrate-data failed for /archive/20130204/example.file5.gz
--
[20XX-XX-XX 02:53:48.532288] W [dht-rebalance.c:367:__dht_check_free_space] 0-test_volume-dht: data movement attempted from node (test_volume-replicate-2) with higher disk space to a node (qvt-replicate-1) with lesser disk space (/archive/20130204/example.file6.gz)
[20XX-XX-XX 02:53:48.532335] E [dht-rebalance.c:1215:gf_defrag_migrate_data] 0-test_volume-dht: migrate-data failed for /archive/20130204/example.file6.gz
This exposes a flaw in the logic. Instead of skipping files due to space constraints, they are getting marked as failures.
Click here is the upstream patch.
Click here to view the BZ.
- Network Disruption:
Notice the "not started" status.
[root@node302 ~]# gluster volume rebalance test_volume status
Node Rebalanced-files size scanned failures status
--------- ----------- ----------- ----------- ----------- ------------
localhost 0 0 2058872 6 stopped
node306.example.com 420 385068769 1172045 5396 stopped
node305.example.com 149670 2348115457 1207746 37650 stopped
192.168.0.89 51482 5688556371 1188642 35537 stopped
node301.example.com 340696 61019675449 2260553 117938 stopped
node304.example.com 32888 295721807 2063515 316 stopped
node307.example.com 2974 74680289 2017031 5826 stopped
node310.example.com 0 0 2018027 6 stopped
node308.example.com 0 0 0 0 not started
node309.example.com 0 0 2015211 9283 stopped
Here's what happened.
$ grep disconnected -i var/log/glusterfs/test_volume-rebalance.log
[20XX-XX-XX 10:43:53.818079] I [client.c:2098:client_rpc_notify] 0-test_volume-client-2: disconnected
[20XX-XX-XX 14:53:32.359111] I [client.c:2098:client_rpc_notify] 0-test_volume-client-5: disconnected
"0-test_volume-client-5" and "0-test_volume-client-2" were disconnected due to ping timeout.
From /var/lib/glusterd/vols/test_volume/test_volume-fuse.vol
file:
volume test_volume-client-2
type protocol/client
option remote-host node303.example.com
option remote-subvolume /rhs/brick1/test_volume
option transport-type tcp
option ping-timeout 20
end-volume
volume test_volume-client-5
type protocol/client
option remote-host node306.example.com
option remote-subvolume /rhs/brick1/test_volume
option transport-type tcp
option ping-timeout 20
end-volume
But the real reason for "rebalance" not executing is:
[20XX-XX-XX 14:53:32.359160] W [client3_1-fops.c:1556:client3_1_inodelk_cbk] 0-test_volume-client-4: remote operation failed: No such file or directory
[20XX-XX-XX 14:53:32.359179] W [client3_1-fops.c:1556:client3_1_inodelk_cbk] 0-test_volume-client-5: remote operation failed: Transport endpoint is not connected
[20XX-XX-XX 14:53:32.359192] I [afr-lk-common.c:996:afr_lock_blocking] 0-test_volume-replicate-2: unable to lock on even one child
[20XX-XX-XX 14:53:32.359206] I [afr-transaction.c:1031:afr_post_blocking_inodelk_cbk] 0-test_volume-replicate-2: Blocking inodelks failed.
[20XX-XX-XX 14:53:32.359328] I [dht-common.c:2381:dht_setxattr] 0-test_volume-dht: fixing the layout of /1002955407/classic/node04.example.com/.git/objects/pack
[20XX-XX-XX 14:53:32.359370] W [dht-selfheal.c:594:dht_fix_layout_of_directory] 0-test_volume-dht: 1 subvolume(s) are down. Skipping fix layout
The "0-test_volume-replicate" pair, for example "0-test_volume-client-4/client-5" failed to acquire locks for self-heal. Since Replicate per transaction locks are necessary during self-heal. Distribute recognized this as a failure and has skipped the fix layout.
Since the ping-timeout is lower than the default to '20' seconds, a server not responding for more than 20seconds can lead to aborting the "rebalance" operation. Please check your network.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments