10.6. Replacing Hosts
10.6.1. Replacing a Host Machine with a Different Hostname
Important
sys0.example.com and the replacement machine is sys5.example.com. The brick with an unrecoverable failure is sys0.example.com:/rhs/brick1/b1 and the replacement brick is sys5.example.com:/rhs/brick1/b1.
- Probe the new peer from one of the existing peers to bring it into the cluster.
# gluster peer probe sys5.example.com
- Ensure that the new brick
(sys5.example.com:/rhs/brick1/b1)that is replacing the old brick(sys0.example.com:/rhs/brick1/b1)is empty. - Retrieve the brick paths in
sys0.example.comusing the following command:# gluster volume info <VOLNAME>
Volume Name: vol Type: Replicate Volume ID: 0xde822e25ebd049ea83bfaa3c4be2b440 Status: Started Snap Volume: no Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: sys0.example.com:/rhs/brick1/b1 Brick2: sys1.example.com:/rhs/brick1/b1 Options Reconfigured: performance.readdir-ahead: on snap-max-hard-limit: 256 snap-max-soft-limit: 90 auto-delete: disable
Brick path insys0.example.comis/rhs/brick1/b1. This has to be replaced with the brick in the newly added host,sys5.example.com. - Create the required brick path in sys5.example.com.For example, if /rhs/brick is the XFS mount point in sys5.example.com, then create a brick directory in that path.
# mkdir /rhs/brick1/b1
- Execute the
replace-brickcommand with the force option:# gluster volume replace-brick vol sys0.example.com:/rhs/brick1/b1 sys5.example.com:/rhs/brick1/b1 commit force volume replace-brick: success: replace-brick commit successful
- Verify that the new brick is online.
# gluster volume status Status of volume: vol Gluster process Port Online Pid Brick sys5.example.com:/rhs/brick1/b1 49156 Y 5731 Brick sys1.example.com:/rhs/brick1/b1 49153 Y 5354
- Initiate self-heal on the volume. The status of the heal process can be seen by executing the command:
# gluster volume heal VOLNAME
- The status of the heal process can be seen by executing the command:
# gluster volume heal VOLNAME info
- Detach the original machine from the trusted pool.
# gluster peer detach sys0.example.com
- Ensure that after the self-heal completes, the extended attributes are set to zero on the other bricks in the replica.
# getfattr -d -m. -e hex /rhs/brick1/b1 getfattr: Removing leading '/' from absolute path names #file: rhs/brick1/b1 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.vol-client-0=0x000000000000000000000000 trusted.afr.vol-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.volume-id=0xde822e25ebd049ea83bfaa3c4be2b440
In this example, the extended attributestrusted.afr.vol-client-0andtrusted.afr.vol-client-1have zero values. This means that the data on the two bricks is identical. If these attributes are not zero after self-heal is completed, the data has not been synchronised correctly.
10.6.2. Replacing a Host Machine with the Same Hostname
/var/lib/glusterd/glusterd/info file.
Warning
- Stop the
glusterdservice on the sys0.example.com.# service glusterd stop - Retrieve the UUID of the failed host (sys0.example.com) from another of the Red Hat Gluster Storage Trusted Storage Pool by executing the following command:
# gluster peer statusNumber of Peers: 2 Hostname: sys1.example.com Uuid: 1d9677dc-6159-405e-9319-ad85ec030880 State: Peer in Cluster (Connected) Hostname: sys0.example.com Uuid: b5ab2ec3-5411-45fa-a30f-43bd04caf96b State: Peer Rejected (Connected)Note that the UUID of the failed host isb5ab2ec3-5411-45fa-a30f-43bd04caf96b - Edit the
glusterd.infofile in the new host and include the UUID of the host you retrieved in the previous step.# cat /var/lib/glusterd/glusterd.infoUUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96boperating-version=30703 - Select any host (say for example, sys1.example.com) in the Red Hat Gluster Storage Trusted Storage Pool and retrieve its UUID from the
glusterd.infofile.# grep -i uuid /var/lib/glusterd/glusterd.infoUUID=8cc6377d-0153-4540-b965-a4015494461c - Gather the peer information files from the host (sys1.example.com) in the previous step. Execute the following command in that host (sys1.example.com) of the cluster.
# cp -a /var/lib/glusterd/peers /tmp/ - Remove the peer file corresponding to the failed host (sys0.example.com) from the
/tmp/peersdirectory.# rm /tmp/peers/b5ab2ec3-5411-45fa-a30f-43bd04caf96bNote that the UUID corresponds to the UUID of the failed host (sys0.example.com) retrieved in Step 2. - Archive all the files and copy those to the failed host(sys0.example.com).
# cd /tmp; tar -cvf peers.tar peers - Copy the above created file to the new peer.
# scp /tmp/peers.tar root@sys0.example.com:/tmp - Copy the extracted content to the
/var/lib/glusterd/peersdirectory. Execute the following command in the newly added host with the same name (sys0.example.com) and IP Address.# tar -xvf /tmp/peers.tar# cp peers/* /var/lib/glusterd/peers/ - Select any other host in the cluster other than the node (sys1.example.com) selected in step 4. Copy the peer file corresponding to the UUID of the host retrieved in Step 4 to the new host (sys0.example.com) by executing the following command:
# scp /var/lib/glusterd/peers/<UUID-retrieved-from-step4> root@Example1:/var/lib/glusterd/peers/ - Retrieve the brick directory information, by executing the following command in any host in the cluster.
# gluster volume infoVolume Name: vol Type: Replicate Volume ID: 0x8f16258c88a0498fbd53368706af7496 Status: Started Snap Volume: no Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: sys0.example.com:/rhs/brick1/b1 Brick2: sys1.example.com:/rhs/brick1/b1 Options Reconfigured: performance.readdir-ahead: on snap-max-hard-limit: 256 snap-max-soft-limit: 90 auto-delete: disableIn the above example, the brick path in sys0.example.com is,/rhs/brick1/b1. If the brick path does not exist in sys0.example.com, perform steps a, b, and c.- Create a brick path in the host, sys0.example.com.
mkdir /rhs/brick1/b1
- Retrieve the volume ID from the existing brick of another host by executing the following command on any host that contains the bricks for the volume.
# getfattr -d -m. -ehex <brick-path>Copy the volume-id.# getfattr -d -m. -ehex /rhs/brick1/b1getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/b1 trusted.afr.vol-client-0=0x000000000000000000000000 trusted.afr.vol-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.volume-id=0x8f16258c88a0498fbd53368706af7496In the above example, the volume id is 0x8f16258c88a0498fbd53368706af7496 - Set this volume ID on the brick created in the newly added host and execute the following command on the newly added host (sys0.example.com).
# setfattr -n trusted.glusterfs.volume-id -v <volume-id> <brick-path>For Example:# setfattr -n trusted.glusterfs.volume-id -v 0x8f16258c88a0498fbd53368706af7496 /rhs/brick2/drv2
Data recovery is possible only if the volume type is replicate or distribute-replicate. If the volume type is plain distribute, you can skip steps 12 and 13. - Create a FUSE mount point to mount the glusterFS volume.
# mount -t glusterfs <server-name>:/VOLNAME <mount> - Perform the following operations to change the Automatic File Replication extended attributes so that the heal process happens from the other brick (sys1.example.com:/rhs/brick1/b1) in the replica pair to the new brick (sys0.example.com:/rhs/brick1/b1). Note that /mnt/r2 is the FUSE mount path.
- Create a new directory on the mount point and ensure that a directory with such a name is not already present.
# mkdir /mnt/r2/<name-of-nonexistent-dir> - Delete the directory and set the extended attributes.
# rmdir /mnt/r2/<name-of-nonexistent-dir># setfattr -n trusted.non-existent-key -v abc /mnt/r2# setfattr -x trusted.non-existent-key /mnt/r2 - Ensure that the extended attributes on the other bricks in the replica (in this example,
trusted.afr.vol-client-0) is not set to zero.# getfattr -d -m. -e hex /rhs/brick1/b1 # file: rhs/brick1/b1security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.vol-client-0=0x000000000000000300000002 trusted.afr.vol-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.volume-id=0x8f16258c88a0498fbd53368706af7496
- Start the
glusterdservice.# service glusterd start - Perform the self-heal operation on the restored volume.
# gluster volume heal VOLNAME - You can view the gluster volume self-heal status by executing the following command:
# gluster volume heal VOLNAME info
If there are only 2 hosts in the Red Hat Gluster Storage Trusted Storage Pool where the host sys0.example.com must be replaced, perform the following steps:
- Stop the
glusterdservice on sys0.example.com.# service glusterd stop - Retrieve the UUID of the failed host (sys0.example.com) from another peer in the Red Hat Gluster Storage Trusted Storage Pool by executing the following command:
# gluster peer statusNumber of Peers: 1 Hostname: sys0.example.com Uuid: b5ab2ec3-5411-45fa-a30f-43bd04caf96b State: Peer Rejected (Connected)Note that the UUID of the failed host isb5ab2ec3-5411-45fa-a30f-43bd04caf96b - Edit the
glusterd.infofile in the new host (sys0.example.com) and include the UUID of the host you retrieved in the previous step.# cat /var/lib/glusterd/glusterd.infoUUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96boperating-version=30703 - Create the peer file in the newly created host (sys0.example.com) in /var/lib/glusterd/peers/<uuid-of-other-peer> with the name of the UUID of the other host (sys1.example.com).UUID of the host can be obtained with the following:
# gluster system:: uuid get
Example 10.7. Example to obtain the UUID of a host
For example, # gluster system:: uuid get UUID: 1d9677dc-6159-405e-9319-ad85ec030880
In this case the UUID of other peer is1d9677dc-6159-405e-9319-ad85ec030880 - Create a file
/var/lib/glusterd/peers/1d9677dc-6159-405e-9319-ad85ec030880in sys0.example.com, with the following command:# touch /var/lib/glusterd/peers/1d9677dc-6159-405e-9319-ad85ec030880
The file you create must contain the following information:UUID=<uuid-of-other-node> state=3 hostname=<hostname>
- Continue to perform steps 11 to 16 as documented in the previous procedure.

Where did the comment section go?
Red Hat's documentation publication system recently went through an upgrade to enable speedier, more mobile-friendly content. We decided to re-evaluate our commenting platform to ensure that it meets your expectations and serves as an optimal feedback mechanism. During this redesign, we invite your input on providing feedback on Red Hat documentation via the discussion platform.