5.8. Creating Arbitrated Replicated Volumes
Advantages of arbitrated replicated volumes
- Better consistency
- When an arbiter is configured, arbitration logic uses client-side quorum in auto mode to prevent file operations that would lead to split-brain conditions.
- Less disk space required
- Because an arbiter brick only stores file names and metadata, an arbiter brick can be much smaller than the other bricks in the volume.
- Fewer nodes required
- The node that contains the arbiter brick of one volume can be configured with the data brick of another volume. This "chaining" configuration allows you to use fewer nodes to fulfill your overall storage requirements.
- Easy migration from deprecated two-way replicated volumes
- Red Hat Gluster Storage can convert a two-way replicated volume without arbiter bricks into an arbitrated replicated volume. See Section 5.8.5, “Converting to an arbitrated volume” for details.
Limitations of arbitrated replicated volumes
- Arbitrated replicated volumes provide better data consistency than a two-way replicated volume that does not have arbiter bricks. However, because arbitrated replicated volumes store only metadata, they provide the same level of availability as a two-way replicated volume that does not have arbiter bricks. To achieve high-availability, you need to use a three-way replicated volume instead of an arbitrated replicated volume.
- Tiering is not compatible with arbitrated replicated volumes.
- Arbitrated volumes can only be configured in sets of three bricks at a time. Red Hat Gluster Storage can convert an existing two-way replicated volume without arbiter bricks into an arbitrated replicated volume by adding an arbiter brick to that volume. See Section 5.8.5, “Converting to an arbitrated volume” for details.
5.8.1. Arbitrated volume requirements
18.104.22.168. System requirements for nodes hosting arbiter bricks
Table 5.1. Requirements for arbitrated configurations on physical machines
|Configuration type||Min CPU||Min RAM||NIC||Arbiter Brick Size||Max Latency|
|Dedicated arbiter||64-bit quad-core processor with 2 sockets||8 GB[a]||Match to other nodes in the storage pool||1 TB to 4 TB[b]||5 ms|
|Chained arbiter||Match to other nodes in the storage pool||1 TB to 4 TB[c]||5 ms|
[a] More RAM may be necessary depending on the combined capacity of the number of arbiter bricks on the node.
- minimum 4 vCPUs
- minimum 16 GB RAM
- 1 TB to 4 TB of virtual disk space
- maximum 5 ms latency
22.214.171.124. Arbiter capacity requirements
minimum arbiter brick size = 4 KB * ( size in KB of largest data brick in volume or replica set / average file size in KB)
minimum arbiter brick size = 4 KB * ( 1 TB / 2 GB ) = 4 KB * ( 1000000000 KB / 2000000 KB ) = 4 KB * 500 KB = 2000 KB = 2 MB
minimum arbiter brick size = 4 KB * ( size in KB of largest data brick in volume or replica set / shard block size in KB )
5.8.2. Arbitration logic
Table 5.2. Allowed operations for current volume state
|Volume state||Arbitration behavior|
|All bricks available||All file operations permitted.|
|Arbiter and 1 data brick available|| |
If the arbiter does not agree with the available data node, write operations fail with ENOTCONN (since the brick that is correct is not available). Other file operations are permitted.
If the arbiter's metadata agrees with the available data node, all file operations are permitted.
|Arbiter down, data bricks available||All file operations are permitted. The arbiter's records are healed when it becomes available.|
|Only one brick available|| |
If the available brick is a data brick, client quorum is not met, and the volume enters an EROFS state.
If the available brick is the arbiter, all file operations fail with ENOTCONN.
5.8.3. Creating an arbitrated replicated volume
# gluster volume create VOLNAME replica 3 arbiter 1 HOST1:DATA_BRICK1 HOST2:DATA_BRICK2 HOST3:ARBITER_BRICK3
# gluster volume create testvol replica 3 arbiter 1 \ server1:/bricks/brick server2:/bricks/brick server3:/bricks/arbiter_brick \ server4:/bricks/brick server5:/bricks/brick server6:/bricks/arbiter_brick
# gluster volume info testvol Volume Name: testvol Type: Distributed-Replicate Volume ID: ed9fa4d5-37f1-49bb-83c3-925e90fab1bc Status: Created Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: server1:/bricks/brick Brick2: server2:/bricks/brick Brick3: server3:/bricks/arbiter_brick (arbiter) Brick1: server4:/bricks/brick Brick2: server5:/bricks/brick Brick3: server6:/bricks/arbiter_brick (arbiter) Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on
5.8.4. Creating multiple arbitrated replicated volumes across fewer total nodes
- Chain multiple arbitrated replicated volumes together, by placing the arbiter brick for one volume on the same node as a data brick for another volume. Chaining is useful for write-heavy workloads when file size is closer to metadata file size (that is, from 32–128 KiB). This avoids all metadata I/O going through a single disk.In arbitrated distributed-replicated volumes, you can also place an arbiter brick on the same node as another replica sub-volume's data brick, since these do not share the same data.
- Place the arbiter bricks from multiple volumes on a single dedicated node. A dedicated arbiter node is suited to write-heavy workloads with larger files, and read-heavy workloads.
Example 5.9. Example of a dedicated configuration
# gluster volume create firstvol replica 3 arbiter 1 server1:/bricks/brick server2:/bricks/brick server3:/bricks/arbiter_brick # gluster volume create secondvol replica 3 arbiter 1 server4:/bricks/data_brick server5:/bricks/brick server3:/bricks/brick
Example 5.10. Example of a chained configuration
# gluster volume create arbrepvol replica 3 arbiter 1 server1:/bricks/brick1 server2:/bricks/brick1 server3:/bricks/arbiter_brick1 server2:/bricks/brick2 server3:/bricks/brick2 server4:/bricks/arbiter_brick2 server3:/bricks/brick3 server4:/bricks/brick3 server5:/bricks/arbiter_brick3 server4:/bricks/brick4 server5:/bricks/brick4 server6:/bricks/arbiter_brick4 server5:/bricks/brick5 server6:/bricks/brick5 server1:/bricks/arbiter_brick5 server6:/bricks/brick6 server1:/bricks/brick6 server2:/bricks/arbiter_brick6
5.8.5. Converting to an arbitrated volume
- A two-way replicated volume without arbiter bricks can be converted into an arbitrated replicated volume.
- A two-way distributed-replicated volume without arbiter bricks can be converted into an arbitrated distributed-replicated volume.
Self-heal-daemonon the volumes before executing the
add-brickcommand, and turn it
ononce the conversion is done.
# gluster volume set VOLNAME cluster.data-self-heal off
# gluster volume set VOLNAME cluster.metadata-self-heal off
# gluster volume set VOLNAMEcluster.entry-self-heal off
# gluster volume set VOLNAME self-heal-daemon off
# gluster volume add-brick VOLNAME replica 3 arbiter 1 HOST:arbiter-brick-path
# gluster volume add-brick testvol replica 3 arbiter 1 server:/bricks/arbiter_brick
# gluster volume add-brick testvol replica 3 arbiter 1 server1:/bricks/arbiter_brick1 server2:/bricks/arbiter_brick2
Self-heal-daemonusing the following commands:
# gluster volume set VOLNAME cluster.*-self-heal on
# gluster volume set VOLNAME self-heal-daemon on
5.8.6. Tuning recommendations for arbitrated volumes
- For dedicated arbiter nodes, use JBOD for arbiter bricks, and RAID6 for data bricks.
- For chained arbiter volumes, use the same RAID6 drive for both data and arbiter bricks.