Chapter 4. Using Red Hat Gluster Storage in the Google Cloud Platform

Red Hat Gluster Storage provides support to the data needs of cloud-scale applications on Google Cloud Platform (GCP). Red Hat Gluster Storage provides software-defined file storage solution to run on GCP so that customer's applications can use traditional file interfaces with scale-out flexibility and performance.
At the core of the Red Hat Gluster Storage design is a completely new method of architecting storage. The result is a system that has immense scalability, is highly resilient, and offers extraordinary performance.
Google Cloud Platform Overview

The Google Cloud Platform is Google’s public cloud offering, which provides many services to run a fully integrated cloud-based environment. The Google Compute Engine is what drives and manages the virtual machine environment. This chapter is based on this virtual machine infrastructure. This virtual framework provides networking, storage, and virtual machines to scale out the Red Hat Gluster Storage environment to meet the demands of the specified workload.

For more information on Google Cloud Platform, see https://cloud.google.com, and for information on the Google Compute Engine, see https://cloud.google.com/compute/docs.
The following diagram illustrates Google Cloud Platform integration with Red Hat Gluster Storage.
Integration Architecture

Figure 4.1. Integration Architecture

For more information on Red Hat Gluster Storage architecture, concepts, and implementation, see Red Hat Gluster Storage Administration Guide: https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/index.html.
This chapter describes the steps necessary to deploy a Red Hat Gluster Storage environment to Google Cloud Platform using 10 x 2 Distribute-Replicate volume.

4.1. Planning your Deployment

This chapter models a 100 TB distributed and replicated file system space. The application server model, which is a Red Hat Gluster Storage client, includes 10 virtual machine instances running a streaming video capture and retrieval simulation. This simulation provides a mixed workload representative of I/O patterns that may be common among other common use cases where a distributed storage system may be most suitable.
While this scale allows us to model a high-end simulation of storage capacity and intensity of client activity, a minimum viable implementation may be achieved at a significantly smaller scale. As the model is scaled down your individual requirements and use cases are considered, certain fundamental approaches of this architecture should be taken into account, such as instance sizing, synchronous replication across zones, careful isolation of failure domains, and asynchronous replication to a remote geographical site.
Maximum Persistent Disk Size

The original test build was limited by the maximum per-VM persistent disk size of 10 TB. Google has since increased that limit to 64 TB. Red Hat will support persistent disks per VM up to Google's current maximum size of 64 TB. (Note that 64 TB is both a per-disk and a per-VM maximum, so the actual data disk maximum will be 64 TB minus the operating system disk size.)

Other real-world use cases may involve significantly more client connections than represented in this chapter. While the particular study performed here was limited in client scale due to a focus on server and storage scale, some basic throughput tests showed the linear scale capabilities of the storage system. As always, your own design should be tuned to your particular use case and tested for performance and scale limitations.

4.1.1. Environment

The scale target is roughly 100 TB of usable storage, with 2-way synchronous replication between zones in the primary pool, and additionally remote asynchronous geo-replication to a secondary pool in another region for disaster recovery. As of this writing, the current maximum size of a Google Compute Engine persistent disk is 10 TB, therefore our design requires 20 bricks for the primary pool and 10 bricks for the secondary pool. The secondary pool will have single data copies which are not synchronously replicated.
Note that there is also currently a per-VM limit of 10 TB of persistent disk, so the actual data disk will be configured at 10,220 GB in order to account for the 20 GB root volume persistent disk.
All nodes will use a Red Hat Gluster Storage 3.1 on Red Hat Enterprise Linux 7 image that will be manually created and configured with a local virtualization system, that is KVM. Red Hat Gluster Storage replica peers in the local region are placed in separate zones within each region. This allows our synchronous replica copies to be highly available in the case of a zone outage.
The Red Hat Gluster Storage server nodes are built as n1-highmem-4 machine types. This machine type is the minimally viable configuration based on the published resource requirements for Red Hat Gluster Storage. Some concession has been made for the minimum memory size based on expected cloud use cases. The n1-highmem-8 machine type may be a more appropriate match, depending on your application and specific needs.

4.1.2. Prerequisites

4.1.3. Primary Storage Pool Configuration

  • Red Hat Gluster Storage configured in a 10 x 2 Distribute-Replicate volume
  • 20 x n1-highmem-4 instances:
    ResourceSpecification
    vCPU4
    Memory26 GB
    Boot Disk 20 GB standard persistent disk
    Data Disk 10,220 GB standard persistent disk. The maximum persistent disk allocation for a single instance is 10 TB. Therefore the maximum size of our data disk is necessarily 10 TB minus the 20 GB size of the boot disk, or 10,220 GB.
    ImageCustom Red Hat Gluster Storage 3.1 on Red Hat Enterprise Linux 7
  • VM zone allocation:
    Each Gluster synchronous replica pair is placed across zones in order to limit the impact of a zone failure. A single zone failure will not result in a loss of data access. Note that the setting synchronous replica pairs is a function of the order the bricks defined in the gluster volume create command.

4.1.4. Secondary Storage Pool Configuration

  • Gluster configured in a 10 x 1 Distribute volume
  • 10 x n1-highmem-4 instances:
    ResourceSpecification
    vCPU4
    Memory24 GB
    Boot Disk 20 GB standard persistent disk
    Data Disk 10,220 GB standard persistent disk
    ImageCustom Red Hat Gluster Storage 3.1 on Red Hat Enterprise Linux 7
  • VM zone allocation:
    The secondary storage pool as designed as a receiver of asynchronous replication, via geo-replication, in a remote region for disaster recovery. To limit the cost of this protective layer, this storage pool is not synchronously replicated within its local region and a distribute-only gluster volume is used. In order to limit the potential impact of an outage, all nodes in this region are placed in the same zone.

4.1.5. Client Configuration

Client VMs have been distributed as evenly as possible across the US-CENTRAL1 region, zones A and B.
  • 10 x n1-standard-2 instances:
    ResourceSpecification
    vCPU2
    Memory7.5 GB
    Boot Disk 10 GB standard persistent disk
    ImageCustom Red Hat Gluster Storage 3.1 on Red Hat Enterprise Linux 7

4.1.6. Trusted Pool Topology

4.1.7. Obtaining Red Hat Gluster Storage for Google Cloud Platform

To download the Red Hat Gluster Storage Server files using a Red Hat Subscription or a Red Hat Evaluation Subscription:
  1. Visit the Red Hat Customer Service Portal at https://access.redhat.com/login and enter your user name and password to log in.
  2. Click Downloads to visit the Software & Download Center.
  3. In the Red Hat Gluster Storage Server area, click Download Software to download the latest version of the qcow2 image.