Why Are Some Gluster Files Being Constantly Healed?
Issue
-
Running the command
gluster v heal test-vol info
, it shows an stuck entry, needing to be healed:gluster volume heal test-vol info Brick node1:/brick1/brick file1 Status: Connected Number of entries: 1 Brick node2:/brick2/brick Status: Connected Number of entries: 0 Brick node3:/brick3/arbiter file Status: Connected Number of entries: 1
-
At the time of collecting the extended attributes of this entry, two of the nodes point at a third one, as the source of the heal:
[root@node1 ~]# getfattr -d -m . -e hex /brick1/brick/file1 getfattr: Removing leading '/' from absolute path names # file: /brick1/brick/file trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-vol-client-1=0x0000001a0000000000000000 trusted.gfid=0xd2dc2973dfff404a8e7b23e4d6e7b83b trusted.glusterfs.shard.block-size=0x0000000000400000 trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000 [root@node2 ~]# getfattr -d -m . -e hex /brick2/brick/file1 getfattr: Removing leading '/' from absolute path names # file: /brick2/brick/file1 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-vol-client-0=0x000000000000000000000000 trusted.afr.test-vol-client-2=0x000000000000000000000000 trusted.gfid=0xd2dc2973dfff404a8e7b23e4d6e7b83b trusted.glusterfs.shard.block-size=0x0000000000400000 trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000 [root@node3 ~]# getfattr -d -m . -e hex /brick3/brick/file1 getfattr: Removing leading '/' from absolute path names # file: /brick3/brick/file1 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.test-vol-client-1=0x0000001d0000000000000000 trusted.gfid=0xd2dc2973dfff404a8e7b23e4d6e7b83b trusted.glusterfs.shard.block-size=0x0000000000400000 trusted.glusterfs.shard.file-size=0x0000000000100000000000000000000000000000000008000000000000000000
From the above, nodes 1 and 3 ( clients 0 and 2 for the AFR xlator ) agree that node 2 ( client 1 ), is the one that needs to be healed. Looking at the log files of the self-healing daemon, available at
/var/log/glusterfs/glustershd.log
, this is exactly the case. This file is being healed from these two nodes in a loop:[2020-06-29 12:11:25.034839] I [MSGID: 108026] [afr-self-heal-common.c:1212:afr_log_selfheal] 0-test-vol-replicate-0: Completed data selfheal on d2dc2973-dfff-404a-8e7b-23e4d6e7b83b. sources=[0] 2 sinks=1 [2020-06-29 12:11:27.819586] I [MSGID: 108026] [afr-self-heal-common.c:1212:afr_log_selfheal] 0-test-vol-replicate-0: Completed data selfheal on d2dc2973-dfff-404a-8e7b-23e4d6e7b83b. sources=[0] 2 sinks=1 [2020-06-29 12:41:32.100811] I [MSGID: 108026] [afr-self-heal-common.c:1212:afr_log_selfheal] 0-test-vol-replicate-0: Completed data selfheal on d2dc2973-dfff-404a-8e7b-23e4d6e7b83b. sources=[0] 2 sinks=1
-
Why is this occurring? How to fix this problem?
Environment
Red Hat Gluster Storage version 3.x
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.