Resolving GFID mismatch problems in RHGS volumes
Issue
GFID mismatches are errors that can occur in replicate volumes. A GFID is a unique identifier for an object within glusterfs (whether it is a file, directory, or symbolic link). It is like an inode number in a traditional filesystem. The replicate module in glusterfs keeps multiple copies of the files and directories (all data) based on the replica count given at the time of the volume creation.
So, ideally, all of the replica copies of a file should have the same GFID, but sometimes, file (or same path) will have different GFIDs for different replica copies. (e.g. the gluster file system is mounted at /mnt/glusterfs and the file being accessed is /mnt/glusterfs/dir/file and while system 1 has one GFID stored for the file, system 2 has a different GFID stored for it's copy of the same file.)
This mismatch in GFIDs confuses the replicate module (as the unique identifier of the object is different for each copy of the file) and will result in an I/O error being returned to applications attempting to access that object.
Client reports I/O error on opening affected files and 'ls -l' gives output like the following:
?????????? ? ? ? ? ? <fileName>
This situation primarily happens in a gluster volume with a replica count of 2 (the two-way replica is the most commonly used replica type in glusterfs). For volumes containing three nodes and above, establishing a quorum will help avoid such situations. In two-way replicate volumes quorums are ineffective as, if either node fails, the quorum is lost.
When a GFID mismatch occurs, the /var/log/glusterfs/glustershd.log file will contain messages similar to the one shown below:
gfid:09856517-c21e-42cf-9287-52427ba6cb7b>/file1 e9584323-706e-4fad-824a-b3f5a0b8902e on repl-client-1 and 0be37ac4-d84f-4988-a56c-0e93fbadb21b on repl-client-0
The breakdown of this log entry is shown below:
repl - the name of the gluster volume.
file1 - the filesystem object (in this case a file) that has a GFID mismatch.
09856517-c21e-42cf-9287-52427ba6cb7b - the GFID of the parent directory where the object "file1" is located.
e9584323-706e-4fad-824a-b3f5a0b8902e - the GFID of the second replica of "file1".
0be37ac4-d84f-4988-a56c-0e93fbadb21b - the GFID of the first replica of "file1".
NOTE: Subvolumes of the replicate translator are named in the following manner: [gluster volume name]-client-[index]. Where index ranges from 0 to [replica count - 1] e.g. if the replica count is 2, then index can vary from 0 to 1.
Therefore, in the example above, repl-client-0 would represent the first copy and repl-client-1 would represent the second copy.
Environment
- Red Hat Gluster Storage
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.