Red Hat Training
A Red Hat training course is available for Red Hat Gluster Storage
5.8. Creating Arbitrated Replicated Volumes
An arbitrated replicated volume, or arbiter volume, is a three-way replicated volume where every third brick is a special type of brick called an arbiter. Arbiter bricks do not store file data; they only store file names, structure, and metadata. The arbiter uses client quorum to compare this metadata with the metadata of the other nodes to ensure consistency in the volume and prevent split-brain conditions.
Advantages of arbitrated replicated volumes
- Better consistency
- When an arbiter is configured, arbitration logic uses client-side quorum in auto mode to prevent file operations that would lead to split-brain conditions.
- Less disk space required
- Because an arbiter brick only stores file names and metadata, an arbiter brick can be much smaller than the other bricks in the volume.
- Fewer nodes required
- The node that contains the arbiter brick of one volume can be configured with the data brick of another volume. This "chaining" configuration allows you to use fewer nodes to fulfill your overall storage requirements.
- Easy migration from two-way replicated volumes
- Red Hat Gluster Storage can convert a two-way replicated volume into an arbitrated replicated volume. See Section 5.8.5, “Converting to an arbitrated volume” for details.
Limitations of arbitrated replicated volumes
- Although arbitrated replicated volumes provide better data consistency than a two-way replicated volume, because they store only metadata, they provide the same level of availability as a two-way replicated volume. To achieve high-availability, you need to use a three-way replicated volume instead of an arbitrated replicated volume.
- Tiering is not compatible with arbitrated replicated volumes.
- Arbiters can only be configured for three-way replicated volumes. However, Red Hat Gluster Storage can convert an existing two-way replicated volume into an arbitrated replicated volume. See Section 5.8.5, “Converting to an arbitrated volume” for details.
5.8.1. Arbitrated volume requirements
This section outlines the requirements of a supported arbitrated volume deployment.
188.8.131.52. System requirements for arbiter nodes
The minimum system requirements for a node that contains an arbiter brick differ depending on the configuration choices made by the administrator. See Section 5.8.4, “Creating multiple arbitrated replicated volumes across fewer total nodes” for details about the differences between the dedicated arbiter and chained arbiter configurations.
Table 5.1. Requirements for arbitrated configurations on physical machines
|Configuration type||Min CPU||Min RAM||NIC||Arbiter Brick Size||Max Latency|
|Dedicated arbiter||64-bit quad-core processor with 2 sockets||8 GB[a]||Match to other nodes in the storage pool||1 TB to 4 TB[b]||5 ms[c]|
|Chained arbiter||Match to other nodes in the storage pool||1 TB to 4 TB[d]||5 ms[e]|
[a] More RAM may be necessary depending on the combined capacity of the number of arbiter bricks on the node.
[b] Arbiter and data bricks can be configured on the same device provided that the data and arbiter bricks belong to different replica sets. See Section 184.108.40.206, “Arbiter capacity requirements” for further details on sizing arbiter volumes.
[c] This is the maximum round trip latency requirement between all nodes irrespective of Aribiter node. See KCS#413623 to know how to determine latency between nodes.
[d] Multiple bricks can be created on a single RAIDed physical device. Please refer the following product documentation: Section 21.2, “Brick Configuration”
[e] This is the maximum round trip latency requirement between all nodes irrespective of Aribiter node. See KCS#413623 to know how to determine latency between nodes.
The requirements for arbitrated configurations on virtual machines are:
- minimum 4 vCPUs
- minimum 16 GB RAM
- 1 TB to 4 TB of virtual disk space
- maximum 5 ms latency
220.127.116.11. Arbiter capacity requirements
Because an arbiter brick only stores file names and metadata, an arbiter brick can be much smaller than the other bricks in the volume or replica set. The required size for an arbiter brick depends on the number of files being stored on the volume.
The recommended minimum arbiter brick size can be calculated with the following formula:
minimum arbiter brick size = 4 KB * ( size in KB of largest data brick in volume or replica set / average file size in KB)
For example, if you have two 1 TB data bricks, and the average size of the files is 2 GB, then the recommended minimum size for your arbiter brick 2 MB, as shown in the following example:
minimum arbiter brick size = 4 KB * ( 1 TB / 2 GB ) = 4 KB * ( 1000000000 KB / 2000000 KB ) = 4 KB * 500 KB = 2000 KB = 2 MB
If sharding is enabled, and your shard-block-size is smaller than the average file size in KB, then you need to use the following formula instead, because each shard also has a metadata file:
minimum arbiter brick size = 4 KB * ( size in KB of largest data brick in volume or replica set / shard block size in KB )
Alternatively, if you know how many files you will store in a volume, the recommended minimum arbiter brick size is the maximum number of files multiplied by 4 KB. For example, if you expect to have 200,000 files on your volume, your arbiter brick should be at least 800,000 KB, or 0.8 GB, in size.
Red Hat also recommends overprovisioning where possible so that there is no short-term need to increase the size of the arbiter brick.
5.8.2. Arbitration logic
In an arbitrated volume, whether a file operation is permitted depends on the current state of the bricks in the volume. The following table describes arbitration behavior in all possible volume states.
Table 5.2. Allowed operations for current volume state
|Volume state||Arbitration behavior|
|All bricks available||All file operations permitted.|
|Arbiter and 1 data brick available|| |
If the arbiter does not agree with the available data node, write operations fail with ENOTCONN (since the brick that is correct is not available). Other file operations are permitted.
If the arbiter's metadata agrees with the available data node, all file operations are permitted.
|Arbiter down, data bricks available||All file operations are permitted. The arbiter's records are healed when it becomes available.|
|Only one brick available|| |
If the available brick is a data brick, client quorum is not met, and the volume enters an EROFS state.
If the available brick is the arbiter, all file operations fail with ENOTCONN.
5.8.3. Creating an arbitrated replicated volume
The command for creating an arbitrated replicated volume has the following syntax:
# gluster volume create VOLNAME replica 3 arbiter 1 HOST1:BRICK1 HOST2:BRICK2 ...
This creates a volume with one arbiter for every three replicate bricks. The arbiter is the last brick in every set of three bricks.
In the following example, the bricks on server3 and server6 are the arbiter bricks.
# gluster volume create testvol replica 3 arbiter 1 \ server1:/bricks/brick server2:/bricks/brick server3:/bricks/brick \ server4:/bricks/brick server5:/bricks/brick server6:/bricks/brick
# gluster volume info testvol Volume Name: testvol Type: Distributed-Replicate Volume ID: ed9fa4d5-37f1-49bb-83c3-925e90fab1bc Status: Created Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: server1:/bricks/brick Brick2: server2:/bricks/brick Brick3: server3:/bricks/brick (arbiter) Brick1: server4:/bricks/brick Brick2: server5:/bricks/brick Brick3: server6:/bricks/brick (arbiter) Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on
5.8.4. Creating multiple arbitrated replicated volumes across fewer total nodes
If you are configuring more than one arbitrated-replicated volume, or a single volume with multiple replica sets, you can use fewer nodes in total by using either of the following techniques:
- Chain multiple arbitrated replicated volumes together, by placing the arbiter brick for one volume on the same node as a data brick for another volume. Chaining is useful for write-heavy workloads when file size is closer to metadata file size (that is, from 32–128 KiB). This avoids all metadata I/O going through a single disk.In arbitrated distributed-replicated volumes, you can also place an arbiter brick on the same node as another replica sub-volume's data brick, since these do not share the same data.
- Place the arbiter bricks from multiple volumes on a single dedicated node. A dedicated arbiter node is suited to write-heavy workloads with larger files, and read-heavy workloads.
Example 5.9. Example of a dedicated configuration
The following commands create two arbitrated replicated volumes, firstvol and secondvol. Server3 contains the arbiter bricks of both volumes.
# gluster volume create firstvol replica 3 arbiter 1 server1:/bricks/brick server2:/bricks/brick server3:/bricks/arbiter_brick # gluster volume create secondvol replica 3 arbiter 1 server4:/bricks/data_brick server5:/bricks/brick server3:/bricks/brick
Example 5.10. Example of a chained configuration
The following command configures an arbitrated replicated volume with six sub-volumes chained across six servers in a 6 x (2 + 1) configuration.
# gluster volume create arbrepvol replica 3 arbiter 1 server1:/bricks/brick1 server2:/bricks/brick1 server3:/bricks/brick1 server2:/bricks/brick2 server3:/bricks/brick2 server4:/bricks/brick2 server3:/bricks/brick3 server4:/bricks/brick3 server5:/bricks/brick3 server4:/bricks/brick4 server5:/bricks/brick4 server6:/bricks/brick4 server5:/bricks/brick5 server6:/bricks/brick5 server1:/bricks/brick5 server6:/bricks/brick6 server1:/bricks/brick6 server2:/bricks/brick6
5.8.5. Converting to an arbitrated volume
Red Hat Gluster Storage lets you convert a two-way replicated volume into arbitrated replicated volume, or a two-way distributed-replicated volume into an arbitrated distributed-replicated volume, by adding an arbiter brick to your existing volume, like so:
# gluster volume add-brick VOLNAME replica 3 arbiter 1 HOST:arbiter-brick-path
For example, if you have an existing two-way replicated volume called testvol, and a new brick for the arbiter to use, you can add a brick as an arbiter with the following command:
# gluster volume add-brick testvol replica 3 arbiter 1 server:/bricks/arbiter_brick
If you have an existing two-way distributed-replicated volume, you need a new brick for each sub-volume in order to convert it to an arbitrated distributed-replicated volume, for example:
# gluster volume add-brick testvol replica 3 arbiter 1 server1:/bricks/arbiter_brick1 server2:/bricks/arbiter_brick2
5.8.6. Tuning recommendations for arbitrated volumes
Red Hat recommends the following when arbitrated volumes are in use:
- For dedicated arbiter nodes, use JBOD for arbiter bricks, and RAID-6 for data bricks.
- For chained arbiter volumes, use the same RAID-6 drive for both data and arbiter bricks.
See Chapter 21, Tuning for Performance for more information on enhancing performance that is not specific to the use of arbiter volumes.