RHGS - glusterfsd brick process fail to start with error: Missing trusted.glusterfs.volume-id extended attribute on brick root
Issue
This is a gluster volume "vol1" with mutiple bricks where some of the brick fails to start,
these bricks belong to the same gluster node "glus-1"
Brick glus-1:/rhgs/bricks/brick4/4 N/A N/A N N/A
Brick glus-1:/rhgs/bricks/brick5/5 N/A N/A N N/A
Brick glus-1:/rhgs/bricks/brick6/6 N/A N/A N N/A
The reason is because the glusterfsd process for these bricks are not running (not seen on ps -ef
output)
But there are other bricks up and running on the same node:
Brick glus-1:/rhgs/bricks/brick1/1 49153 0 Y 14729
Brick glus-1:/rhgs/bricks/brick2/2 49154 0 Y 14736
Brick glus-1:/rhgs/bricks/brick3/3 49155 0 Y 14753
Here is an example of a glusterfsd brick process running, for brick1
root 14729 112 0.0 2639228 128120 ? Ssl 2020 646322:05 /usr/sbin/glusterfsd -s glus-1 --volfile-id vol1.glus-1.rhgs-bricks-brick1-1 -p /var/run/gluster/vols/vol1/glus-1-rhgs-bricks-brick1-1.pid -S /var/run/gluster/9b088991bb64ce3a.socket --brick-name /rhgs/bricks/brick1/1 -l /var/log/glusterfs/bricks/rhgs-bricks-brick1-1.log --xlator-option *-posix.glusterd-uuid=xxxxx-8d56-yyyy-bee5-xxxx --brick-port 49153 --xlator-option vol1-server.listen-port=49153
If we restart glusterd on that node we can see the following errors on /var/log/glusterfs/glusterd.log
# date; systemctl restart glusterd
Tue Apr 13 15:00:18 CEST 2021
[2021-04-13 13:00:19.177039] I [MSGID: 100030] [glusterfsd.c:2646:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.2 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
...
[2021-04-13 13:00:25.996425] E [glusterd-utils.c:6185:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/rhgs/bricks/brick4/4), brick is deemed not to be a part of the volume (vol1)
[2021-04-13 13:00:25.996453] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to glus-1:/rhgs/bricks/brick4/4
[2021-04-13 13:00:25.996473] E [glusterd-utils.c:6185:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/rhgs/bricks/brick5/5), brick is deemed not to be a part of the volume (vol1)
[2021-04-13 13:00:25.996481] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to glus-1:/rhgs/bricks/brick5/5
[2021-04-13 13:00:25.996502] E [glusterd-utils.c:6185:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/rhgs/bricks/brick6/6), brick is deemed not to be a part of the volume (vol1)
[2021-04-13 13:00:25.996520] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to glus-1:/rhgs/bricks/brick6/6
If we try to start force the volume we see the same sort of errors:
# date; gluster volume start vol1 force
Tue Apr 13 15:00:44 CEST 2021
[2021-04-13 13:00:44.051178] E [mem-pool.c:307:__gf_free] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x23d6e) [0x7f3338265d6e] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x1f7be) [0x7f33382617be] -->/lib64/li
bglusterfs.so.0(__gf_free+0x10c) [0x7f33439658fc] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size)
[2021-04-13 13:00:44.710084] I [MSGID: 106062] [glusterd-volume-ops.c:2690:glusterd_op_start_volume] 0-management: Global dict not present.
[2021-04-13 13:00:44.710174] E [glusterd-utils.c:6185:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/rhgs/bricks/brick4/4), brick is deemed not to be a part of the volume (vol1)
[2021-04-13 13:00:44.710195] E [glusterd-utils.c:6185:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/rhgs/bricks/brick6/6), brick is deemed not to be a part of the volume (vol1)
[2021-04-13 13:00:44.710328] E [glusterd-utils.c:6185:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/rhgs/bricks/brick5/5), brick is deemed not to be a part of the volume (vol1)
and the glusterfsd process fails to start for these bricks.
On the messages files, at the same time we can see some IO errors against a disk device:
Apr 13 15:00:44 glus-1 kernel: sd 3:1:0:0: [sdf] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Apr 13 15:00:44 glus-1 kernel: sd 3:1:0:0: [sdf] CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
Apr 13 15:00:44 glus-1 kernel: blk_update_request: I/O error, dev sdf, sector 0
Environment
- RHGS 3.X
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.