5.6. Creating Replicated Volumes

Important

Creating replicated volume with replica count greater than 3 is under technology preview. Technology Preview features are not fully supported under Red Hat service-level agreements (SLAs), may not be functionally complete, and are not intended for production use.
Tech Preview features provide early access to upcoming product innovations, enabling customers to test functionality and provide feedback during the development process.
As Red Hat considers making future iterations of Technology Preview features generally available, we will provide commercially reasonable efforts to resolve any reported issues that customers experience when using these features.
Replicated volume creates copies of files across multiple bricks in the volume. Use replicated volumes in environments where high-availability and high-reliability are critical.
Use gluster volume create to create different types of volumes, and gluster volume info to verify successful volume creation.
Prerequisites

5.6.1. Creating Two-way Replicated Volumes

Warning

As of Red Hat Gluster Storage 3.3, two-way replication is considered deprecated. Two-way deprecation remains supported for this release, but Red Hat no longer recommends its use, and plans to remove support in future versions of Red Hat Gluster Storage. This change affects both replicated and distributed-replicated volumes.
Two-way replication is being deprecated because it does not provide adequate protection from split-brain conditions. Even in distributed-replicated configurations, two-way replication cannot ensure that the correct copy of a conflicting file is selected without the use of a tie-breaking node.
While a dummy node can be used as an interim solution for this problem, Red Hat recommends that all volumes that currently use two-way replication are migrated to use either arbitrated replication or three-way replication.
Instructions for migrating a two-way replicated volume to an arbitrated replicated volume are available in the Red Hat Gluster Storage 3.3 Administration Guide: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/administration_guide/#sect-Convert_Rep2_to_Arbiter.
Two-way replicated volume creates two copies of files across the bricks in the volume. The number of bricks must be multiple of two for a replicated volume. To protect against server and disk failures, it is recommended that the bricks of the volume are from different servers.
Illustration of a Two-way Replicated Volume

Figure 5.3. Illustration of a Two-way Replicated Volume

Creating two-way replicated volumes
  1. Run the gluster volume create command to create the replicated volume.
    The syntax is # gluster volume create NEW-VOLNAME [replica COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...
    The default value for transport is tcp. Other options can be passed such as auth.allow or auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.

    Example 5.3. Replicated Volume with Two Storage Servers

    The order in which bricks are specified determines how they are replicated with each other. For example, every 2 bricks, where 2 is the replica count, forms a replica set. This is illustrated in Figure 5.3, “Illustration of a Two-way Replicated Volume” .
    # gluster volume create test-volume replica 2 transport tcp server1:/rhgs/brick1 server2:/rhgs/brick2
    Creation of test-volume has been successful
    Please start the volume to access data.
  2. Run # gluster volume start VOLNAME to start the volume.
    # gluster volume start test-volume
    Starting test-volume has been successful
  3. Run gluster volume info command to optionally display the volume information.

Important

You must set client-side quorum on replicated volumes to prevent split-brain scenarios. For more information on setting client-side quorum, see Section 11.13.1.2, “Configuring Client-Side Quorum”

5.6.2. Creating Three-way Replicated Volumes

Three-way replicated volume creates three copies of files across multiple bricks in the volume. The number of bricks must be equal to the replica count for a replicated volume. To protect against server and disk failures, it is recommended that the bricks of the volume are from different servers.
Synchronous three-way replication is now fully supported in Red Hat Gluster Storage. It is recommended that three-way replicated volumes use JBOD, but use of hardware RAID with three-way replicated volumes is also supported.
Illustration of a Three-way Replicated Volume

Figure 5.4. Illustration of a Three-way Replicated Volume

Creating three-way replicated volumes
  1. Run the gluster volume create command to create the replicated volume.
    The syntax is # gluster volume create NEW-VOLNAME [replica COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...
    The default value for transport is tcp. Other options can be passed such as auth.allow or auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.

    Example 5.4. Replicated Volume with Three Storage Servers

    The order in which bricks are specified determines how bricks are replicated with each other. For example, every n bricks, where 3 is the replica count forms a replica set. This is illustrated in Figure 5.4, “Illustration of a Three-way Replicated Volume”.
    # gluster volume create test-volume replica 3 transport tcp server1:/rhgs/brick1 server2:/rhgs/brick2 server3:/rhgs/brick3
    Creation of test-volume has been successful
    Please start the volume to access data.
  2. Run # gluster volume start VOLNAME to start the volume.
    # gluster volume start test-volume
    Starting test-volume has been successful
  3. Run gluster volume info command to optionally display the volume information.

Important

By default, the client-side quorum is enabled on three-way replicated volumes to minimize split-brain scenarios. For more information on client-side quorum, see Section 11.13.1.2, “Configuring Client-Side Quorum”

5.6.3. Creating Sharded Replicated Volumes

Sharding breaks files into smaller pieces so that they can be distributed across the bricks that comprise a volume. This is enabled on a per-volume basis.
When sharding is enabled, files written to a volume are divided into pieces. The size of the pieces depends on the value of the volume's features.shard-block-size parameter. The first piece is written to a brick and given a GFID like a normal file. Subsequent pieces are distributed evenly between bricks in the volume (sharded bricks are distributed by default), but they are written to that brick's .shard directory, and are named with the GFID and a number indicating the order of the pieces. For example, if a file is split into four pieces, the first piece is named GFID and stored normally. The other three pieces are named GFID.1, GFID.2, and GFID.3 respectively. They are placed in the .shard directory and distributed evenly between the various bricks in the volume.
Because sharding distributes files across the bricks in a volume, it lets you store files with a larger aggregate size than any individual brick in the volume. Because the file pieces are smaller, heal operations are faster, and geo-replicated deployments can sync the small pieces of a file that have changed, rather than syncing the entire aggregate file.
Sharding also lets you increase volume capacity by adding bricks to a volume in an ad-hoc fashion.

5.6.3.1. Supported use cases

Sharding has one supported use case: in the context of providing Red Hat Gluster Storage as a storage domain for Red Hat Enterprise Virtualization, to provide storage for live virtual machine images. Note that sharding is also a requirement for this use case, as it provides significant performance improvements over previous implementations.

Important

Quotas are not compatible with sharding.

Important

Sharding is supported in new deployments only, as there is currently no upgrade path for this feature.

Example 5.5. Example: Three-way replicated sharded volume

  1. Before you start your volume, enable sharding on the volume.
    # gluster volume set test-volume features.shard enable
  2. Start the volume and ensure it is working as expected.
    # gluster volume test-volume start
    # gluster volume info test-volume

5.6.3.2. Configuration Options

Sharding is enabled and configured at the volume level. The configuration options are as follows.
features.shard
Enables or disables sharding on a specified volume. Valid values are enable and disable. The default value is disable.
# gluster volume set volname features.shard enable
Note that this only affects files created after this command is run; files created before this command is run retain their old behaviour.
features.shard-block-size
Specifies the maximum size of the file pieces when sharding is enabled. The supported value for this parameter is 512MB.
# gluster volume set volname features.shard-block-size 32MB
Note that this only affects files created after this command is run; files created before this command is run retain their old behaviour.

5.6.3.3. Finding the pieces of a sharded file

When you enable sharding, you might want to check that it is working correctly, or see how a particular file has been sharded across your volume.
To find the pieces of a file, you need to know that file's GFID. To obtain a file's GFID, run:
# getfattr -d -m. -e hex path_to_file
Once you have the GFID, you can run the following command on your bricks to see how this file has been distributed:
# ls /rhgs/*/.shard -lh | grep GFID