RHGS - glusterfsd brick process fail to start with error: Missing trusted.glusterfs.volume-id extended attribute on brick root

Solution Verified - Updated -

Issue

This is a gluster volume "vol1" with mutiple bricks where some of the brick fails to start,
these bricks belong to the same gluster node "glus-1"

Brick glus-1:/rhgs/bricks/brick4/4            N/A       N/A        N       N/A
Brick glus-1:/rhgs/bricks/brick5/5            N/A       N/A        N       N/A
Brick glus-1:/rhgs/bricks/brick6/6            N/A       N/A        N       N/A

The reason is because the glusterfsd process for these bricks are not running (not seen on ps -ef output)

But there are other bricks up and running on the same node:

Brick glus-1:/rhgs/bricks/brick1/1            49153     0          Y       14729
Brick glus-1:/rhgs/bricks/brick2/2            49154     0          Y       14736
Brick glus-1:/rhgs/bricks/brick3/3            49155     0          Y       14753

Here is an example of a glusterfsd brick process running, for brick1

root     14729  112  0.0 2639228 128120 ?      Ssl   2020 646322:05 /usr/sbin/glusterfsd -s glus-1 --volfile-id vol1.glus-1.rhgs-bricks-brick1-1 -p /var/run/gluster/vols/vol1/glus-1-rhgs-bricks-brick1-1.pid -S /var/run/gluster/9b088991bb64ce3a.socket --brick-name /rhgs/bricks/brick1/1 -l /var/log/glusterfs/bricks/rhgs-bricks-brick1-1.log --xlator-option *-posix.glusterd-uuid=xxxxx-8d56-yyyy-bee5-xxxx --brick-port 49153 --xlator-option vol1-server.listen-port=49153

If we restart glusterd on that node we can see the following errors on /var/log/glusterfs/glusterd.log

# date; systemctl restart glusterd
Tue Apr 13 15:00:18 CEST 2021

[2021-04-13 13:00:19.177039] I [MSGID: 100030] [glusterfsd.c:2646:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.2 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)
...

[2021-04-13 13:00:25.996425] E [glusterd-utils.c:6185:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/rhgs/bricks/brick4/4), brick is deemed not to be a part of the volume (vol1) 
[2021-04-13 13:00:25.996453] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to glus-1:/rhgs/bricks/brick4/4
[2021-04-13 13:00:25.996473] E [glusterd-utils.c:6185:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/rhgs/bricks/brick5/5), brick is deemed not to be a part of the volume (vol1) 
[2021-04-13 13:00:25.996481] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to glus-1:/rhgs/bricks/brick5/5
[2021-04-13 13:00:25.996502] E [glusterd-utils.c:6185:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/rhgs/bricks/brick6/6), brick is deemed not to be a part of the volume (vol1) 
[2021-04-13 13:00:25.996520] E [MSGID: 106005] [glusterd-server-quorum.c:408:glusterd_do_volume_quorum_action] 0-management: Failed to connect to glus-1:/rhgs/bricks/brick6/6

If we try to start force the volume we see the same sort of errors:

# date; gluster volume start vol1 force
Tue Apr 13 15:00:44 CEST 2021

[2021-04-13 13:00:44.051178] E [mem-pool.c:307:__gf_free] (-->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x23d6e) [0x7f3338265d6e] -->/usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x1f7be) [0x7f33382617be] -->/lib64/li
bglusterfs.so.0(__gf_free+0x10c) [0x7f33439658fc] ) 0-: Assertion failed: GF_MEM_TRAILER_MAGIC == *(uint32_t *)((char *)free_ptr + header->size)
[2021-04-13 13:00:44.710084] I [MSGID: 106062] [glusterd-volume-ops.c:2690:glusterd_op_start_volume] 0-management: Global dict not present.
[2021-04-13 13:00:44.710174] E [glusterd-utils.c:6185:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/rhgs/bricks/brick4/4), brick is deemed not to be a part of the volume (vol1) 
[2021-04-13 13:00:44.710195] E [glusterd-utils.c:6185:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/rhgs/bricks/brick6/6), brick is deemed not to be a part of the volume (vol1) 
[2021-04-13 13:00:44.710328] E [glusterd-utils.c:6185:glusterd_brick_start] 0-management: Missing trusted.glusterfs.volume-id extended attribute on brick root (/rhgs/bricks/brick5/5), brick is deemed not to be a part of the volume (vol1) 

and the glusterfsd process fails to start for these bricks.

On the messages files, at the same time we can see some IO errors against a disk device:

Apr 13 15:00:44 glus-1 kernel: sd 3:1:0:0: [sdf] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Apr 13 15:00:44 glus-1 kernel: sd 3:1:0:0: [sdf] CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00
Apr 13 15:00:44 glus-1 kernel: blk_update_request: I/O error, dev sdf, sector 0

Environment

  • RHGS 3.X

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content