Possible reasons for failures during a "rebalance" operation on a Gluster volume

Solution Unverified - Updated -

Environment

Red Hat Storage Server 2.0 Update 4

Issue

What are some of the possible reasons for failures during a "rebalance" operation on a Gluster volume?

Resolution

Here are a few of the reasons, as to why failures are encountered during a rebalance operation:

  • Skipped files are marked as failures. This is fixed in the upcoming 2.1 release of Red Hat Storage.
  • A disruption in the network.
  • A real failure!

If there is a cause for concern, please open a case with Red Hat Support to verify validity of "rebalance" failures.

Diagnostic Steps

- Skipped files being marked as failures:

root@node304 ~]# gluster volume rebalance test_volume status

Node                     Rebalanced-files        size       scanned      failures         status
---------                -----------      -----------   -----------   -----------   ------------
localhost                          0                0          2410            0     in progress
192.168.0.89                     432       1313462286          1970            0     in progress
node306.example.com                0                0          2411            0     in progress
node305.example.com              220       1401085348          1810          213     in progress
node301.example.com              351       1457860671          2039          101     in progress
node310.example.com                0                0          2410            0     in progress
node309.example.com                0                0          2407            0     in progress
node308.example.com                0                0          2407            0     in progress
node307.example.com                0                0          2409            0     in progress
node302.example.com                0                0          2410            0     in progress

Messages from "node301":

# grep -B1 "failed for" var/log/glusterfs/test_volume-rebalance.log

[20XX-XX-XX 02:31:02.824982] W [dht-rebalance.c:367:__dht_check_free_space] 0-test_volume-dht: data movement attempted from node (test_volume-replicate-0) with higher disk space to a node (test_volume-replicate-1) with lesser disk space (/archive/20130204/example.file1.gz)
[20XX-XX-XX 02:31:02.825059] E [dht-rebalance.c:1215:gf_defrag_migrate_data] 0-test_volume-dht: migrate-data failed for /archive/20130204/example.file1.gz
--
[20XX-XX-XX 02:38:37.777530] W [dht-rebalance.c:367:__dht_check_free_space] 0-test_volume-dht: data movement attempted from node (test_volume-replicate-0) with higher disk space to a node (test_volume-replicate-1) with lesser disk space (/archive/20130204/example.file2.gz)
[20XX-XX-XX 02:38:37.777577] E [dht-rebalance.c:1215:gf_defrag_migrate_data] 0-test_volume-dht: migrate-data failed for /archive/20130204/example.file2.gz
--
[20XX-XX-XX 02:38:37.802157] W [dht-rebalance.c:367:__dht_check_free_space] 0-test_volume-dht: data movement attempted from node (test_volume-replicate-0) with higher disk space to a node (test_volume-replicate-1) with lesser disk space (/archive/20130204/example.file3.gz)
[20XX-XX-XX 02:38:37.802204] E [dht-rebalance.c:1215:gf_defrag_migrate_data] 0-test_volume-dht: migrate-data failed for /archive/20130204/example.file3.gz

Messages from "node305":

# grep -B1 "failed for" var/log/glusterfs/test_volume-rebalance.log

[20XX-XX-XX 02:53:48.483732] W [dht-rebalance.c:367:__dht_check_free_space] 0-test_volume-dht: data movement attempted from node (test_volume-replicate-2) with higher disk space to a node (qvt-replicate-0) with lesser disk space (/archive/20130204/example.file4.gz)
[20XX-XX-XX 02:53:48.483780] E [dht-rebalance.c:1215:gf_defrag_migrate_data] 0-test_volume-dht: migrate-data failed for /archive/20130204/example.file4.gz
--
[20XX-XX-XX 02:53:48.523134] W [dht-rebalance.c:367:__dht_check_free_space] 0-test_volume-dht: data movement attempted from node (test_volume-replicate-2) with higher disk space to a node (qvt-replicate-0) with lesser disk space (/archive/20130204/example.file5.gz)
[20XX-XX-XX 02:53:48.523197] E [dht-rebalance.c:1215:gf_defrag_migrate_data] 0-test_volume-dht: migrate-data failed for /archive/20130204/example.file5.gz
--
[20XX-XX-XX 02:53:48.532288] W [dht-rebalance.c:367:__dht_check_free_space] 0-test_volume-dht: data movement attempted from node (test_volume-replicate-2) with higher disk space to a node (qvt-replicate-1) with lesser disk space (/archive/20130204/example.file6.gz)
[20XX-XX-XX 02:53:48.532335] E [dht-rebalance.c:1215:gf_defrag_migrate_data] 0-test_volume-dht: migrate-data failed for /archive/20130204/example.file6.gz

This exposes a flaw in the logic. Instead of skipping files due to space constraints, they are getting marked as failures.

Click here is the upstream patch.
Click here to view the BZ.

- Network Disruption:
Notice the "not started" status.

[root@node302 ~]# gluster volume rebalance test_volume status
Node                Rebalanced-files      size          scanned       failures      status
---------           -----------           -----------   -----------   -----------   ------------
localhost                     0                     0       2058872             6        stopped
node306.example.com         420             385068769       1172045          5396        stopped
node305.example.com      149670            2348115457       1207746         37650        stopped
192.168.0.89             51482            5688556371       1188642         35537        stopped
node301.example.com      340696           61019675449        2260553       117938        stopped
node304.example.com       32888             295721807       2063515           316        stopped
node307.example.com        2974              74680289       2017031          5826        stopped
node310.example.com           0                     0       2018027             6        stopped
node308.example.com           0                     0             0             0    not started
node309.example.com           0                     0       2015211          9283        stopped

Here's what happened.

$ grep disconnected -i var/log/glusterfs/test_volume-rebalance.log
[20XX-XX-XX 10:43:53.818079] I [client.c:2098:client_rpc_notify] 0-test_volume-client-2: disconnected
[20XX-XX-XX 14:53:32.359111] I [client.c:2098:client_rpc_notify] 0-test_volume-client-5: disconnected

"0-test_volume-client-5" and "0-test_volume-client-2" were disconnected due to ping timeout.

From /var/lib/glusterd/vols/test_volume/test_volume-fuse.vol file:

volume test_volume-client-2
    type protocol/client
    option remote-host node303.example.com
    option remote-subvolume /rhs/brick1/test_volume
    option transport-type tcp
    option ping-timeout 20
end-volume

volume test_volume-client-5
    type protocol/client
    option remote-host node306.example.com
    option remote-subvolume /rhs/brick1/test_volume
    option transport-type tcp
    option ping-timeout 20
end-volume

But the real reason for "rebalance" not executing is:

[20XX-XX-XX 14:53:32.359160] W [client3_1-fops.c:1556:client3_1_inodelk_cbk] 0-test_volume-client-4: remote operation failed: No such file or directory
[20XX-XX-XX 14:53:32.359179] W [client3_1-fops.c:1556:client3_1_inodelk_cbk] 0-test_volume-client-5: remote operation failed: Transport endpoint is not connected
[20XX-XX-XX 14:53:32.359192] I [afr-lk-common.c:996:afr_lock_blocking] 0-test_volume-replicate-2: unable to lock on even one child
[20XX-XX-XX 14:53:32.359206] I [afr-transaction.c:1031:afr_post_blocking_inodelk_cbk] 0-test_volume-replicate-2: Blocking inodelks failed.
[20XX-XX-XX 14:53:32.359328] I [dht-common.c:2381:dht_setxattr] 0-test_volume-dht: fixing the layout of /1002955407/classic/node04.example.com/.git/objects/pack
[20XX-XX-XX 14:53:32.359370] W [dht-selfheal.c:594:dht_fix_layout_of_directory] 0-test_volume-dht: 1 subvolume(s) are down. Skipping fix layout

The "0-test_volume-replicate" pair, for example "0-test_volume-client-4/client-5" failed to acquire locks for self-heal. Since Replicate per transaction locks are necessary during self-heal. Distribute recognized this as a failure and has skipped the fix layout.

Since the ping-timeout is lower than the default to '20' seconds, a server not responding for more than 20seconds can lead to aborting the "rebalance" operation. Please check your network.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments