Configuring and Managing Red Hat Storage Server
Edition 1
Legal Notice
Abstract
- Preface
- I. Introduction
- II. Red Hat Storage Administration On-Premise
- 6. The glusterd Service
- 7. Trusted Storage Pools
- 8. Red Hat Storage Volumes
- 8.1. Formatting and Mounting Bricks
- 8.2. Encrypted Disk
- 8.3. Creating Distributed Volumes
- 8.4. Creating Replicated Volumes
- 8.5. Creating Distributed Replicated Volumes
- 8.6. Creating Striped Volumes
- 8.7. Creating Distributed Striped Volumes
- 8.8. Creating Striped Replicated Volumes
- 8.9. Creating Distributed Striped Replicated Volumes
- 8.10. Starting Volumes
- 9. Accessing Data - Setting Up Clients
- 10. Managing Red Hat Storage Volumes
- 11. Configuring Red Hat Storage for Enhancing Performance
- 12. Managing Geo-replication
- 12.1. About Geo-replication
- 12.2. Replicated Volumes vs Geo-replication
- 12.3. Preparing to Deploy Geo-replication
- 12.4. Starting Geo-replication
- 12.5. Starting Geo-replication on a Newly Added Brick
- 12.6. Disaster Recovery
- 12.7. Example - Setting up Cascading Geo-replication
- 12.8. Recommended Practices
- 12.9. Troubleshooting Geo-replication
- 13. Managing Directory Quotas
- 14. Monitoring Your Red Hat Storage Workload
- 14.1. Running the Volume Profile Command
- 14.2. Running the Volume Top Command
- 14.2.1. Viewing Open File Descriptor Count and Maximum File Descriptor Count
- 14.2.2. Viewing Highest File Read Calls
- 14.2.3. Viewing Highest File Write Calls
- 14.2.4. Viewing Highest Open Calls on a Directory
- 14.2.5. Viewing Highest Read Calls on a Directory
- 14.2.6. Viewing Read Performance
- 14.2.7. Viewing Write Performance
- 14.3. Listing Volumes
- 14.4. Displaying Volume Information
- 14.5. Performing Statedump on a Volume
- 14.6. Displaying Volume Status
- 15. Managing Red Hat Storage Volume Life-Cycle Extensions
- III. Red Hat Storage Administration on Public Cloud
- IV. Data Access with Other Interfaces
- 19. Managing Object Store
- 19.1. Architecture Overview
- 19.2. Components of Object Storage
- 19.3. Advantages of using Object Store
- 19.4. Limitations
- 19.5. Prerequisites
- 19.6. Configuring the Object Store
- 19.6.1. Configuring a Proxy Server
- 19.6.2. Configuring the Authentication Service
- 19.6.3. Configuring an Object Server
- 19.6.4. Configuring a Container Server
- 19.6.5. Configuring an Account Server
- 19.6.6. Configuring Swift Object and Container Constrains
- 19.6.7. Exporting the Red Hat Storage Volumes
- 19.6.8. Starting and Stopping Server
- 19.7. Starting the Services Automatically
- 19.8. Working with the Object Store
- V. Appendices
- A. Revision History
Preface
1. Audience
2. License
3. Document Conventions
3.1. Typographic Conventions
Mono-spaced Bold
To see the contents of the filemy_next_bestselling_novelin your current working directory, enter thecat my_next_bestselling_novelcommand at the shell prompt and press Enter to execute the command.
Press Enter to execute the command.Press Ctrl+Alt+F2 to switch to a virtual terminal.
mono-spaced bold. For example:
File-related classes includefilesystemfor file systems,filefor files, anddirfor directories. Each class has its own associated set of permissions.
Choose → → from the main menu bar to launch Mouse Preferences. In the Buttons tab, select the Left-handed mouse check box and click to switch the primary mouse button from the left to the right (making the mouse suitable for use in the left hand).To insert a special character into a gedit file, choose → → from the main menu bar. Next, choose → from the Character Map menu bar, type the name of the character in the Search field and click . The character you sought will be highlighted in the Character Table. Double-click this highlighted character to place it in the Text to copy field and then click the button. Now switch back to your document and choose → from the gedit menu bar.
Mono-spaced Bold Italic or Proportional Bold Italic
To connect to a remote machine using ssh, typessh username@domain.nameat a shell prompt. If the remote machine isexample.comand your username on that machine is john, typessh john@example.com.Themount -o remount file-systemcommand remounts the named file system. For example, to remount the/homefile system, the command ismount -o remount /home.To see the version of a currently installed package, use therpm -q packagecommand. It will return a result as follows:package-version-release.
Publican is a DocBook publishing system.
3.2. Pull-quote Conventions
mono-spaced roman and presented thus:
books Desktop documentation drafts mss photos stuff svn books_tests Desktop1 downloads images notes scripts svgs
mono-spaced roman but add syntax highlighting as follows:
static int kvm_vm_ioctl_deassign_device(struct kvm *kvm,
struct kvm_assigned_pci_dev *assigned_dev)
{
int r = 0;
struct kvm_assigned_dev_kernel *match;
mutex_lock(&kvm->lock);
match = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
assigned_dev->assigned_dev_id);
if (!match) {
printk(KERN_INFO "%s: device hasn't been assigned before, "
"so cannot be deassigned\n", __func__);
r = -EINVAL;
goto out;
}
kvm_deassign_device(kvm, match);
kvm_free_assigned_device(kvm, match);
out:
mutex_unlock(&kvm->lock);
return r;
}3.3. Notes and Warnings
Note
Important
Warning
4. Getting Help and Giving Feedback
4.1. Do You Need Help?
- Search or browse through a knowledge base of technical support articles about Red Hat products.
- Submit a support case to Red Hat Global Support Services (GSS).
- Access other product documentation.
4.2. We Need Feedback
Part I. Introduction
Table of Contents
Chapter 1. Platform Introduction
1.1. About Red Hat Storage
1.2. About glusterFS
1.3. On-premise Installation
1.4. Public Cloud Installation
Chapter 2. Red Hat Storage Architecture
2.1. Red Hat Storage Server for On-premise Architecture
2.2. Red Hat Storage Server for Public Cloud Architecture
Chapter 3. Key Features
3.1. Elasticity
3.2. No Metadata with the Elastic Hashing Algorithm
3.3. Scalability
3.4. High Availability and Flexibility
3.5. Flexibility
3.6. No Application Rewrites
3.7. Simple Management
Top and Profile. Top provides visibility into workload patterns, while Profile provides performance statistics over a user-defined time period for metrics including latency and amount of data read or written.
3.8. Modular, Stackable Design
Chapter 4. Use Case Examples
Note
4.1. Use Case 1: Using Red Hat Storage for Data Archival
4.1.1. Key Features of Red Hat Storage Server for Nearline Storage and Archival
- Elastic ScalabilityStorage volumes are abstracted from the hardware, allowing each volume to be managed independently. Volumes can be expanded or shrunk by adding or removing systems from the storage pool, or by adding or removing storage from individual systems in the pool. Data remains available during these changes, with no downtime or application interruption.
- CompatibilityRed Hat Storage Server has native POSIX compatibility, and also supports SMB, NFS, and HTTP protocols. As a result, Red Hat Storage Server is compatible with industry standard storage management and backup software.
- High AvailabilityIn the event of hardware failure, automatic replication ensures high levels of data protection and resiliency. Red Hat Storage Server has self-healing capabilities, which restores data to a correct state following recovery.
- Unified Global NamespaceA unified global namespace aggregates disk and memory resources into a common pool, simplifying management of the storage environment and eliminating data silos. Namespaces can be expanded or shrunk dynamically, with no interruption to client access.
- Efficient Data AccessRed Hat Storage Server provides fast and efficient random access, enabling quick data recovery.
4.2. Use Case 2: Using Red Hat Storage for High Performance Computing
4.2.1. Key Features of Red Hat Storage Server for High Performance Computing
- Petabyte ScalabilityRed Hat Storage Server’s fully distributed architecture and advanced file management algorithms allow it to support multi-petabyte repositories.
- High Performance with no bottlenecksRed Hat Storage Server enables fast file access by spreading files evenly throughout the system without a centralized metadata server. Nodes can access storage nodes directly, and as a result, hot spots, choke points, and other I/O bottlenecks are eliminated. Data contention is reduced, and there is no single point of failure.
- Elastic ScalabilityStorage volumes are abstracted from the hardware, allowing each volume to be managed independently. Volumes can be expanded or shrunk by adding or removing systems from the storage pool, or by adding or removing storage from individual systems in the pool. Data can be migrated within the system to rebalance capacity, or to add and remove systems without downtime, allowing HPC environments to scale seamlessly.
- Infiniband SupportRed Hat Storage Server supports IP over Infiniband (IPoIB). Using Infiniband as a back-end interconnect for the storage pool is recommended, as it provides additional options for maximizing performance. Using RDMA as a mount protocol for its native client is a technology preview feature.
- CompatibilityRed Hat Storage Server has native POSIX compatibility, and also supports SMB, NFS, and HTTP protocols. Red Hat Storage Server supports most existing applications with no code changes required.
4.3. Use Case 3: Using Red Hat Storage for Content Clouds
4.3.1. Key Features of Red Hat Storage Server for Content Clouds
- ElasticityStorage volumes are abstracted from the hardware, allowing each volume to be managed independently. Volumes can be expanded or shrunk by adding or removing systems from the storage pool, or by adding or removing storage from individual systems in the pool. Data can be migrated within the system to rebalance capacity, or to add and remove systems without downtime, allowing environments to scale seamlessly.
- Petabyte ScalabilityRed Hat Storage Server’s fully distributed architecture and advanced file management algorithms allow it to support multi-petabyte repositories.
- High PerformanceRed Hat Storage Server enables fast file access by spreading files evenly throughout the system without a centralized metadata server. Nodes can access storage nodes directly, and as a result, hot spots, choke points, and other I/O bottlenecks are eliminated. Data contention is reduced, and there is no single point of failure.
- CompatibilityRed Hat Storage Server has native POSIX compatibility, and also supports SMB, NFS, and HTTP protocols. Red Hat Storage Server supports most existing applications with no code changes required.
- ReliabilityIn the event of hardware failure, automatic replication ensures high levels of data protection and resiliency. Red Hat Storage Server has self-healing capabilities, which restores data to a correct state following recovery.
Chapter 5. Storage Concepts
- Brick
- The glusterFS basic unit of storage, represented by an export directory on a server in the trusted storage pool. A brick is expressed by combining a server with an export directory in the following format:
SERVER:EXPORTFor example:myhostname:/exports/myexportdir/ - Block Storage
- Block special files, or block devices, correspond to devices through which the system moves data in the form of blocks. These device nodes often represent addressable devices such as hard disks, CD-ROM drives, or memory regions. Red Hat Storage supports the XFS file system with extended attributes.
- Cluster
- A trusted pool of linked computers working together, resembling a single computing resource. In Red Hat Storage, a cluster is also referred to as a trusted storage pool.
- Client
- The machine that mounts a volume (this may also be a server).
- Distributed File System
- A file system that allows multiple clients to concurrently access data which is spread across servers/bricks in a trusted storage pool. Data sharing among multiple locations is fundamental to all distributed file systems.
- File System
- A method of storing and organizing computer files. A file system organizes files into a database for the storage, manipulation, and retrieval by the computer's operating system.Source: Wikipedia
- FUSE
- Filesystem in User space (FUSE) is a loadable kernel module for Unix-like operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a "bridge" to the kernel interfaces.Source: Wikipedia
- Geo-Replication
- Geo-replication provides a continuous, asynchronous, and incremental replication service from one site to another over Local Area Networks (LAN), Wide Area Networks (WAN), and the Internet.
- glusterd
- glusterd is the glusterFS management daemon which must run on all servers in the trusted storage pool.
- Metadata
- Metadata is data providing information about other pieces of data.
- N-way Replication
- Local synchronous data replication that is typically deployed across campus or Amazon Web Services Availability Zones.
- Namespace
- An abstract container or environment that is created to hold a logical grouping of unique identifiers or symbols. Each Red Hat Storage trusted storage pool exposes a single namespace as a POSIX mount point which contains every file in the trusted storage pool.
- Petabyte
- A petabyte is a unit of information equal to one quadrillion bytes, or 1000 terabytes. The unit symbol for the petabyte is PB. The prefix peta- (P) indicates a power of 1000:1 PB = 1,000,000,000,000,000 B = 1000^5 B = 10^15 B.The term "pebibyte" (PiB), using a binary prefix, is used for the corresponding power of 1024.Source: Wikipedia
- POSIX
- Portable Operating System Interface (for Unix) (POSIX) is the name of a family of related standards specified by the IEEE to define the application programming interface (API), as well as shell and utilities interfaces, for software that is compatible with variants of the UNIX operating system. Red Hat Storage exports a fully POSIX compatible file system.
- RAID
- Redundant Array of Independent Disks (RAID) is a technology that provides increased storage reliability through redundancy. It combines multiple low-cost, less-reliable disk drives components into a logical unit where all drives in the array are interdependent.
- RRDNS
- Round Robin Domain Name Service (RRDNS) is a method to distribute load across application servers. RRDNS is implemented by creating multiple A records with the same name and different IP addresses in the zone file of a DNS server.
- Server
- The machine (virtual or bare metal) that hosts the file system in which data is stored.
- Scale-Up Storage
- Increases the capacity of the storage device in a single dimension. For example, adding additional CPUs, RAM, and disk capacity to a single computer in a trusted storage pool.
- Scale-Out Storage
- Increases the capability of a storage device in single dimension. For example, adding more systems of the same size, or adding servers to a trusted storage pool that increases CPU, disk capacity, and throughput for the trusted storage pool.
- Subvolume
- A brick after being processed by at least one translator.
- Translator
- A translator connects to one or more subvolumes, does something with them, and offers a subvolume connection.
- Trusted Storage Pool
- A storage pool is a trusted network of storage servers. When you start the first server, the storage pool consists of only that server.
- User Space
- Applications running in user space do not directly interact with hardware, instead using the kernel to moderate access. User space applications are generally more portable than applications in kernel space. glusterFS is a user space application.
- Virtual File System (VFS)
- VFS is a kernel software layer that handles all system calls related to the standard Linux file system. It provides a common interface to several kinds of file systems.
- Volfile
- Volfile is a configuration file used by the glusterFS process. Volfile is usually located at
/var/lib/glusterd/vols/VOLNAME. - Volume
- A volume is a logical collection of bricks. Most of the Red Hat Storage management operations happen on the volume.
Part II. Red Hat Storage Administration On-Premise
Table of Contents
- 6. The glusterd Service
- 7. Trusted Storage Pools
- 8. Red Hat Storage Volumes
- 8.1. Formatting and Mounting Bricks
- 8.2. Encrypted Disk
- 8.3. Creating Distributed Volumes
- 8.4. Creating Replicated Volumes
- 8.5. Creating Distributed Replicated Volumes
- 8.6. Creating Striped Volumes
- 8.7. Creating Distributed Striped Volumes
- 8.8. Creating Striped Replicated Volumes
- 8.9. Creating Distributed Striped Replicated Volumes
- 8.10. Starting Volumes
- 9. Accessing Data - Setting Up Clients
- 10. Managing Red Hat Storage Volumes
- 11. Configuring Red Hat Storage for Enhancing Performance
- 12. Managing Geo-replication
- 12.1. About Geo-replication
- 12.2. Replicated Volumes vs Geo-replication
- 12.3. Preparing to Deploy Geo-replication
- 12.4. Starting Geo-replication
- 12.5. Starting Geo-replication on a Newly Added Brick
- 12.6. Disaster Recovery
- 12.7. Example - Setting up Cascading Geo-replication
- 12.8. Recommended Practices
- 12.9. Troubleshooting Geo-replication
- 13. Managing Directory Quotas
- 14. Monitoring Your Red Hat Storage Workload
- 14.1. Running the Volume Profile Command
- 14.2. Running the Volume Top Command
- 14.2.1. Viewing Open File Descriptor Count and Maximum File Descriptor Count
- 14.2.2. Viewing Highest File Read Calls
- 14.2.3. Viewing Highest File Write Calls
- 14.2.4. Viewing Highest Open Calls on a Directory
- 14.2.5. Viewing Highest Read Calls on a Directory
- 14.2.6. Viewing Read Performance
- 14.2.7. Viewing Write Performance
- 14.3. Listing Volumes
- 14.4. Displaying Volume Information
- 14.5. Performing Statedump on a Volume
- 14.6. Displaying Volume Status
- 15. Managing Red Hat Storage Volume Life-Cycle Extensions
Chapter 6. The glusterd Service
glusterd enables dynamic configuration changes to Red Hat Storage volumes, without needing to restart servers or remount storage volumes on clients.
glusterd command line, logical storage volumes can be decoupled from physical hardware. Decoupling allows storage volumes to be grown, resized, and shrunk, without application or server downtime.
6.1. Starting and Stopping the glusterd service
glusterd service is started automatically on all servers in the trusted storage pool. The service can also be manually started and stopped as required.
- Run the following command to start glusterd manually.
# service glusterd start - Run the following command to stop glusterd manually.
# service glusterd stop
Chapter 7. Trusted Storage Pools
7.1. Adding Servers to the Trusted Storage Pool
gluster peer probe [server] command is used to add servers to the trusted server pool.
Adding Three Servers to a Trusted Storage Pool
Prerequisites
- The
glusterdservice must be running on all storage servers requiring addition to the trusted storage pool. See Chapter 6, The glusterd Service for service start and stop commands. Server1, the trusted storage server, is started.- The host names of the target servers must be resolvable by DNS.
- Run
gluster peer probe [server]from Server 1 to add additional servers to the trusted storage pool.Note
Self-probingServer1will result in an error because it is part of the trusted storage pool by default.# gluster peer probe server2 Probe successful # gluster peer probe server3 Probe successful # gluster peer probe server4 Probe successful
- Verify the peer status from all servers using the following command:
# gluster peer status Number of Peers: 3 Hostname: server2 Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5 State: Peer in Cluster (Connected) Hostname: server3 Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7 State: Peer in Cluster (Connected) Hostname: server4 Uuid: 3e0caba-9df7-4f66-8e5d-cbc348f29ff7 State: Peer in Cluster (Connected)
7.2. Removing Servers from the Trusted Storage Pool
gluster peer detach server to remove a server from the storage pool.
Removing One Server from the Trusted Storage Pool
Prerequisites
- The
glusterdservice must be running on the server targeted for removal from the storage pool. See Chapter 6, The glusterd Service for service start and stop commands. - The host names of the target servers must be resolvable by DNS.
- Run
gluster peer detach [server]to remove the server from the trusted storage pool.# gluster peer detach server4 Detach successful
- Verify the peer status from all servers using the following command:
# gluster peer status Number of Peers: 2 Hostname: server2 Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5 State: Peer in Cluster (Connected) Hostname: server3 Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7
Chapter 8. Red Hat Storage Volumes
- 8.1. Formatting and Mounting Bricks
- 8.2. Encrypted Disk
- 8.3. Creating Distributed Volumes
- 8.4. Creating Replicated Volumes
- 8.5. Creating Distributed Replicated Volumes
- 8.6. Creating Striped Volumes
- 8.7. Creating Distributed Striped Volumes
- 8.8. Creating Striped Replicated Volumes
- 8.9. Creating Distributed Striped Replicated Volumes
- 8.10. Starting Volumes
Warning
Note
yum groupinstall "Infiniband Support" to install Infiniband packages:
Volume Types
- Distributed
- Distributes files across bricks in the volume.Use this volume type where scaling and redundancy requirements are not important, or provided by other hardware or software layers.See Section 8.3, “Creating Distributed Volumes” for additional information about this volume type.
- Replicated
- Replicates files across bricks in the volume.Use this volume type in environments where high-availability and high-reliability are critical.See Section 8.4, “Creating Replicated Volumes ” for additional information about this volume type.
- Distributed Replicated
- Distributes files across replicated bricks in the volume.Use this volume type in environments where high-reliability and scalability are critical. This volume type offers improved read performance in most environments.See Section 8.5, “Creating Distributed Replicated Volumes ” for additional information about this volume type.
Important
- Striped
- Stripes data across bricks in the volume.Use this volume type only in high-concurrency environments where accessing very large files is required.See Section 8.6, “Creating Striped Volumes” for additional information about this volume type.
- Striped Replicated
- Stripes data across replicated bricks in the trusted storage pool.Use this volume type only in highly-concurrent environments, where there is parallel access to very large files, and performance is critical.This volume type is supported for
Map Reduceworkloads only. See Section 8.8, “Creating Striped Replicated Volumes ” for additional information about this volume type, and restriction. - Distributed Striped
- Stripes data across two or more nodes in the trusted storage pool.Use this volume type where storage must be scalable, and in high-concurrency environments where accessing very large files is critical.See Section 8.7, “Creating Distributed Striped Volumes ” for additional information about this volume type.
- Distributed Striped Replicated
- Distributes striped data across replicated bricks in the trusted storage pool.Use this volume type only in highly-concurrent environments where performance, and parallel access to very large files is critical.This volume type is supported for
Map Reduceworkloads only. See Section 8.9, “Creating Distributed Striped Replicated Volumes ” for additional information about this volume type.
8.1. Formatting and Mounting Bricks
Important
Formatting and Mounting Bricks
Important
- Run
# mkfs.xfs -i size=512 DEVICEto format the bricks to the supported XFS file system format. The inode size is set to 512 bytes to accommodate for the extended attributes used by Red Hat Storage. - Run
# blkid DEVICEto obtain the Universally Unique Identifier (UUID) of the device. - Run
# mkdir /mountpointto create a directory to link the brick to. - Add an entry in
/etc/fstabusing the obtained UUID from theblkidcommand:UUID=uuid /mountpoint xfs defaults 1 2
- Run
# mount /mountpointto mount the brick. - Run the
df -hcommand to verify the brick is successfully mounted:# df -h /dev/vg_bricks/lv_exp1 16G 1.2G 15G 7% /exp1
Using Subdirectory as the Brick for Volume
/exp directory is the mounted file system and is used as the brick for volume creation. However, for some reason, if the mount point becomes unavailable, any write will continue to happen in the /exp directory, but now this will be under root file system.
/bricks. After the file system is available, create a directory called /bricks/bricksrv1 and use it for volume creation. This approach has the following advantages:
- When the
/bricksfile system is unavailable, there is no longer/bricks/bricksrv2directory available in the system. Hence, there will be no data loss by writing to a different location. - This does not require any additional file system for nesting.
- Run the
pvcreatecommand to initialize the partition.The following example initializes the partition/dev/sdb1as an LVM (Logical Volume Manager) physical volume for later use as part of an LVM logical volume# pvcreate /dev/sdb1
- Run the following command to create the volume group:
# vgcreate datavg /dev/sdb1
- Create the logical volume
lvol1from the volume groupdatavg.# lvcreate -l <number of extents> --name lvol1 datavg
- Create an XFS file system on the logical volume.
# mkfs -t xfs -i size=512 /dev/mapper/datavg-lvol1
- Create
/bricksmount point usingmkdircommand.# mkdir /bricks
- Mount the XFS file system.
# mount -t xfs /dev/datavg/lvol1 /bricks
- Create the
bricksrv1subdirectory in the mounted file system.# mkdir /bricks/bricksrv1
Repeat the above steps on all nodes. - Create the Red Hat Storage volume using the subdirectories as bricks.
# gluster volume create distdata01 ad-rhs-srv1:/bricks/bricksrv1 ad-rhs-srv2:/bricks/bricksrv2
- Start the Red Hat Storage volume.
# gluster volume start distdata01
- Verify the status of the volume.
# gluster volume status distdata01
Reusing a Brick from a Deleted Volume
- Brick with a File System Suitable for Reformatting (Optimal Method)
- Run
# mkfs.xfs -f -i size=512 deviceto reformat the brick to supported requirements, and make it available for immediate reuse in a new volume.Note
All data will be erased when the brick is reformatted. - File System on a Parent of a Brick Directory
- If the file system cannot be reformatted, remove the whole brick directory and create it again.
Procedure 8.1. Cleaning An Unusable Brick
- Delete all previously existing data in the brick, including the
.glusterfssubdirectory. - Run
# setfattr -x trusted.glusterfs.volume-id brickand# setfattr -x trusted.gfid brickto remove the attributes from the root of the brick. - Run
# getfattr -d -m . brickto examine the attributes set on the volume. Take note of the attributes. - Run
# setfattr -x attribute brickto remove the attributes relating to the glusterFS file system.Thetrusted.glusterfs.dhtattribute for a distributed volume is one such example of attributes that need to be removed.
8.2. Encrypted Disk
8.3. Creating Distributed Volumes
Warning
Create a Distributed Volume
gluster volume create to create different types of volumes, and gluster volume info to verify successful volume creation.
Pre-requisites
- A trusted storage pool has been created, as described in Section 7.1, “Adding Servers to the Trusted Storage Pool”.
- Understand how to start and stop volumes, as described in Section 8.10, “Starting Volumes ”.
- Run the
gluster volume createcommand to create the distributed volume.The syntax isgluster volume create NEW-VOLNAME [transport tcp | rdma | tcp,rdma] NEW-BRICK...The default value for transport istcp. Other options can be passed such asauth.alloworauth.reject. See Section 10.1, “Configuring Volume Options” for a full list of parameters.Example 8.1. Distributed Volume with Two Storage Servers
# gluster volume create test-volume server1:/exp1 server2:/exp2 Creation of test-volume has been successful Please start the volume to access data.
Example 8.2. Distributed Volume over InfiniBand with Four Servers
# gluster volume create test-volume transport rdma server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 Creation of test-volume has been successful Please start the volume to access data.
- Run
# gluster volume start VOLNAMEto start the volume.# gluster volume start test-volume Starting test-volume has been successful
- Run
gluster volume infocommand to optionally display the volume information.The following output is the result of Example 8.1, “Distributed Volume with Two Storage Servers”.# gluster volume info Volume Name: test-volume Type: Distribute Status: Created Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: server1:/exp1 Brick2: server2:/exp2
8.4. Creating Replicated Volumes
Important
Note
Create a Replicated Volume
gluster volume create to create different types of volumes, and gluster volume info to verify successful volume creation.
Pre-requisites
- A trusted storage pool has been created, as described in Section 7.1, “Adding Servers to the Trusted Storage Pool”.
- Understand how to start and stop volumes, as described in Section 8.10, “Starting Volumes ”.
- Run the
gluster volume createcommand to create the replicated volume.The syntax is# gluster volume create NEW-VOLNAME [replica COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...The default value for transport istcp. Other options can be passed such asauth.alloworauth.reject. See Section 10.1, “Configuring Volume Options” for a full list of parameters.Example 8.3. Replicated Volume with Two Storage Servers
The order in which bricks are specified determines how bricks are mirrored with each other. For example, first n bricks, where n is the replica count. In this scenario, the first two bricks specified mirror each other. If more bricks were specified, the next two bricks in sequence would mirror each other.# gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 Creation of test-volume has been successful Please start the volume to access data.
- Run
# gluster volume start VOLNAMEto start the volume.# gluster volume start test-volume Starting test-volume has been successful
- Run
gluster volume infocommand to optionally display the volume information.The following output is the result of Example 8.3, “Replicated Volume with Two Storage Servers”.# gluster volume info Volume Name: test-volume Type: Replicate Status: Created Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: server1:/exp1 Brick2: server2:/exp2
8.5. Creating Distributed Replicated Volumes
Important
Note
Create a Distributed Replicated Volume
gluster volume create to create a distributed replicated volume, and gluster volume info to verify successful volume creation.
Pre-requisites
- A trusted storage pool has been created, as described in Section 7.1, “Adding Servers to the Trusted Storage Pool”.
- Understand how to start and stop volumes, as described in Section 8.10, “Starting Volumes ”.
- Run the
gluster volume createcommand to create the distributed replicated volume.The syntax is# gluster volume create NEW-VOLNAME [replica COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...The default value for transport istcp. Other options can be passed such asauth.alloworauth.reject. See Section 10.1, “Configuring Volume Options” for a full list of parameters.Example 8.4. Four Node Distributed Replicated Volume with a Two-way Mirror
The order in which bricks are specified determines how bricks are mirrored with each other. For example, first n bricks, where n is the replica COUNT. In this scenario, the first two bricks specified mirror each other. If more bricks were specified, the next two bricks in sequence would mirror each other.# gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 Creation of test-volume has been successful Please start the volume to access data.
Example 8.5. Six Node Distributed Replicated Volume with a Two-way Mirror
The order in which bricks are specified determines how bricks are mirrored with each other. For example, first n bricks, where n is the replica COUNT. In this scenario, the first two bricks specified mirror each other. If more bricks were specified, the next two bricks in sequence would mirror each other.# gluster volume create test-volume replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 server5:/exp5 server6:/exp6 Creation of test-volume has been successful Please start the volume to access data.
- Run
# gluster volume start VOLNAMEto start the volume.# gluster volume start test-volume Starting test-volume has been successful
- Run
gluster volume infocommand to optionally display the volume information.
8.6. Creating Striped Volumes
Important
Note
Create a Striped Volume
gluster volume create to create a striped volume, and gluster volume info to verify successful volume creation.
Pre-requisites
- A trusted storage pool has been created, as described in Section 7.1, “Adding Servers to the Trusted Storage Pool”.
- Understand how to start and stop volumes, as described in Section 8.10, “Starting Volumes ”.
- Run the
gluster volume createcommand to create the striped volume.The syntax is# gluster volume create NEW-VOLNAME [stripe COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...The default value for transport istcp. Other options can be passed such asauth.alloworauth.reject. See Section 10.1, “Configuring Volume Options” for a full list of parameters.Example 8.6. Striped Volume Across Two Servers
# gluster volume create test-volume stripe 2 transport tcp server1:/exp1 server2:/exp2 Creation of test-volume has been successful Please start the volume to access data.
- Run
# gluster volume start VOLNAMEto start the volume.# gluster volume start test-volume Starting test-volume has been successful
- Run
gluster volume infocommand to optionally display the volume information.
8.7. Creating Distributed Striped Volumes
Important
Note
Create a Distributed Striped Volume
gluster volume create to create a distributed striped volume, and gluster volume info to verify successful volume creation.
Pre-requisites
- A trusted storage pool has been created, as described in Section 7.1, “Adding Servers to the Trusted Storage Pool”.
- Understand how to start and stop volumes, as described in Section 8.10, “Starting Volumes ”.
- Run the
gluster volume createcommand to create the distributed striped volume.The syntax is# gluster volume create NEW-VOLNAME [stripe COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...The default value for transport istcp. Other options can be passed such asauth.alloworauth.reject. See Section 10.1, “Configuring Volume Options” for a full list of parameters.Example 8.7. Distributed Striped Volume Across Two Storage Servers
# gluster volume create test-volume stripe 2 transport tcp server1:/exp1 server1:/exp2 server2:/exp3 server2:/exp4 Creation of test-volume has been successful Please start the volume to access data.
- Run
# gluster volume start VOLNAMEto start the volume.# gluster volume start test-volume Starting test-volume has been successful
- Run
gluster volume infocommand to optionally display the volume information.
8.8. Creating Striped Replicated Volumes
Important
Note
Create a Striped Replicated Volume
gluster volume create to create a striped replicated volume, and gluster volume info to verify successful volume creation.
Pre-requisites
- A trusted storage pool has been created, as described in Section 7.1, “Adding Servers to the Trusted Storage Pool”.
- Understand how to start and stop volumes, as described in Section 8.10, “Starting Volumes ”.
- Run the
gluster volume createcommand to create the striped replicated volume.The syntax is# gluster volume create NEW-VOLNAME [stripe COUNT] [replica COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...The default value for transport istcp. Other options can be passed such asauth.alloworauth.reject. See Section 10.1, “Configuring Volume Options” for a full list of parameters.Example 8.8. Striped Replicated Volume Across Four Servers
The order in which bricks are specified determines how bricks are mirrored with each other. For example, first n bricks, where n is the replica COUNT. In this scenario, the first two bricks specified mirror each other. If more bricks were specified, the next two bricks in sequence would mirror each other.# gluster volume create test-volume stripe 2 replica 2 transport tcp server1:/exp1 server2:/exp3 server3:/exp2 server4:/exp4 Creation of test-volume has been successful Please start the volume to access data.
Example 8.9. Striped Replicated Volume Across Six Servers
The order in which bricks are specified determines how bricks are mirrored with each other. For example, first n bricks, where n is the replica COUNT. In this scenario, the first two bricks specified mirror each other. If more bricks were specified, the next two bricks in sequence would mirror each other.# gluster volume create test-volume stripe 3 replica 2 transport tcp server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4 server5:/exp5 server6:/exp6 Creation of test-volume has been successful Please start the volume to access data.
- Run
# gluster volume start VOLNAMEto start the volume.# gluster volume start test-volume Starting test-volume has been successful
- Run
gluster volume infocommand to optionally display the volume information.
8.9. Creating Distributed Striped Replicated Volumes
Important
Note
Create a Distributed Striped Replicated Volume
gluster volume create to create a distributed striped replicated volume, and gluster volume info to verify successful volume creation.
Pre-requisites
- A trusted storage pool has been created, as described in Section 7.1, “Adding Servers to the Trusted Storage Pool”.
- Understand how to start and stop volumes, as described in Section 8.10, “Starting Volumes ”.
- Run the
gluster volume createcommand to create the distributed striped replicated volume.The syntax is# gluster volume create NEW-VOLNAME [stripe COUNT] [replica COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...The default value for transport istcp. Other options can be passed such asauth.alloworauth.reject. See Section 10.1, “Configuring Volume Options” for a full list of parameters.Example 8.10. Distributed Replicated Striped Volume Across Four Servers
The order in which bricks are specified determines how bricks are mirrored with each other. For example, first n bricks, where n is the replica COUNT. In this scenario, the first two bricks specified mirror each other. If more bricks were specified, the next two bricks in sequence would mirror each other.# gluster volume create test-volume stripe 2 replica 2 transport tcp server1:/exp1 server1:/exp2 server2:/exp3 server2:/exp4 server3:/exp5 server3:/exp6 server4:/exp7 server4:/exp8 Creation of test-volume has been successful Please start the volume to access data.
- Run
# gluster volume start VOLNAMEto start the volume.# gluster volume start test-volume Starting test-volume has been successful
- Run
gluster volume infocommand to optionally display the volume information.
8.10. Starting Volumes
# gluster volume start VOLNAME
# gluster volume start test-volume Starting test-volume has been successful
Chapter 9. Accessing Data - Setting Up Clients
- Native Client (see Section 9.2, “Native Client”)
- Network File System (NFS) v3 (see Section 9.3, “NFS”)
- Server Message Block (SMB) (see Section 9.4, “SMB”)
Table 9.1. Cross Protocol Data Access Matrix
| SMB | NFS | Native Client | Object | |
|---|---|---|---|---|
SMB | Yes | No | No | No |
NFS | No | Yes | Yes | Yes |
Native Client | No | Yes | Yes | Yes |
Object | No | Yes | Yes | Yes |
9.1. Securing Red Hat Storage Client Access
Table 9.2. TCP Port Numbers
| Port Number | Usage |
|---|---|
| 22 | For sshd used by geo-replication. |
| 111 | For rpc port mapper. |
| 139 | For netbios service. |
| 445 | For CIFS protocol. |
| 965 | For NLM. |
| 2049 | For glusterFS's NFS exports (nfsd process). |
| 24007 | For glusterd (for management). |
| 24009 - 24108 | For client communication with Red Hat Storage 2.0. |
| 38465 | For NFS mount protocol. |
| 38466 | For NFS mount protocol. |
| 38468 | For NFS's Lock Manager (NLM). |
| 38469 | For NFS's ACL support. |
| 39543 | For oVirt (Red Hat Storage-Console). |
| 49152 - 49251 | For client communication with Red Hat Storage 2.1 and for brick processes depending on the availability of the ports. The total number of ports required to be open depends on the total number of bricks exported on the machine. |
| 55863 | For oVirt (Red Hat Storage-Console). |
Table 9.3. TCP Port Numbers used for Object Storage (Swift)
| Port Number | Usage |
|---|---|
| 443 | For HTTPS request. |
| 6010 | For Object Server. |
| 6011 | For Container Server. |
| 6012 | For Account Server. |
| 8080 | For Proxy Server. |
Table 9.4. TCP Port Numbers for Nagios Monitoring
| Port Number | Usage |
|---|---|
| 80 | For HTTP protocol (required only if Nagios server is running on a Red Hat Storage node). |
| 443 | For HTTPS protocol (required only for Nagios server). |
| 5667 | For NSCA service (required only if Nagios server is running on a Red Hat Storage node). |
| 5666 | For NRPE service (required in all Red Hat Storage nodes). |
9.2. Native Client
Table 9.6. Red Hat Storage Server Support Matrix
| Red Hat Enterprise Linux version | Red Hat Storage Server version | Native client version |
|---|---|---|
| 6.5 | 3.0 | 3.0, 2.1* |
| 6.6 | 3.0.2 | 3.0, 2.1* |
Note
9.2.1. Installing Native Client
Important
Use the Command Line to Register, and Subscribe a System.
Prerequisites
- Know the user name and password of the Red Hat Network (RHN) account with Red Hat Storage entitlements.
- Run the
rhn_registercommand to register the system with Red Hat Network.# rhn_register - In the Operating System Release Version screen, select All available updates and follow the prompts to register the system to the standard base channel of the respective Red Hat Enterprise Linux Server version.
- Run the
rhn-channel --add --channelcommand to subscribe the system to the correct Red Hat Storage Native Client channel:- For Red Hat Enterprise Linux 7.x clients using Red Hat Satellite Server:
# rhn-channel --add --channel= rhel-x86_64-server-rh-common-7
- For Red Hat Enterprise Linux 6.x clients:
# rhn-channel --add --channel=rhel-x86_64-server-rhsclient-6
- For Red Hat Enterprise Linux 5.x clients:
# rhn-channel --add --channel=rhel-x86_64-server-rhsclient-5
- Execute the following commands, for Red Hat Enterprise Linux clients using Subscription Manager.
- Run the following command and enter your Red Hat Network user name and password to register the system with the Red Hat Network.
# subscription-manager register --auto-attach
- Run the following command to enable the channels required to install Red Hat Storage Native Client:
- For Red Hat Enterprise Linux 7.x clients:
# subscription-manager repos --enable=rhel-7-server-rpms --enable=rhel-7-server-rh-common-rpms
- For Red Hat Enterprise Linux 6.1 and later clients:
# subscription-manager repos --enable=rhel-6-server-rpms --enable=rhel-6-server-rhs-client-1-rpms
- For Red Hat Enterprise Linux 5.7 and later clients:
# subscription-manager repos --enable=rhel-5-server-rpms --enable=rhel-5-server-rhs-client-1-rpms
For more information, see Section 3.2 Registering from the Command Line in the Red Hat Subscription Management guide.
- Run the following command to verify if the system is subscribed to the required channels.
# # yum repolist
Use the Web Interface to Register, and Subscribe a System.
Prerequisites
- Know the user name and password of the Red Hat Network (RHN) account with Red Hat Storage entitlements.
- Log on to Red Hat Network (http://rhn.redhat.com).
- Move the mouse cursor over the
Subscriptionslink at the top of the screen, and then click theRegistered Systemslink. - Click the name of the system to which the Red Hat Storage Native Client channel must be appended.
- Click in the Subscribed Channels section of the screen.
- Expand the node for Additional Services Channels for
Red Hat Enterprise Linux 6 for x86_64or forRed Hat Enterprise Linux 5 for x86_64depending on the client platform. - Click the button to finalize the changes.When the page refreshes, select the Details tab to verify the system is subscribed to the appropriate channels.
Install Native Client Packages
Prerequisites
- Run the
yum installcommand to install the native client RPM packages.# yum install glusterfs glusterfs-fuse - For Red Hat Enterprise 5.x client systems, run the
modprobecommand to load FUSE modules before mounting Red Hat Storage volumes.# modprobe fuseFor more information on loading modules at boot time, see https://access.redhat.com/knowledge/solutions/47028 .
9.2.2. Upgrading Native Client
yum update command to upgrade the native client:
# yum update glusterfs glusterfs-fuse
9.2.3. Mounting Red Hat Storage Volumes
Note
- When a new volume is created in Red Hat Storage 3.0, it cannot be accessed by an older (Red Hat Storage 2.1.x) clients, because the
readdir-aheadtranslator is enabled by default for the newly created Red Hat Storage 3.0 volumes. This makes it incompatible with older clients. In order to resolve this issue, disablereaddir-aheadin the newly created volume using the following command:# gluster volume set volname readdir-ahead off
- Server names selected during volume creation should be resolvable in the client machine. Use appropriate
/etc/hostsentries, or a DNS server to resolve server names to IP addresses.
9.2.3.1. Mount Commands and Options
mount -t glusterfs command. All options must be separated with commas.
# mount -t glusterfs -o backup-volfile-servers=volfile_server2:volfile_server3:.... ..:volfile_serverN,log-level=WARNING,log-file=/var/log/gluster.log server1:/test-volume /mnt/glusterfs- backup-volfile-servers=<volfile_server2>:<volfile_server3>:...:<volfile_serverN>
- List of the backup volfile servers to mount the client. If this option is specified while mounting the fuse client, when the first volfile server fails, the servers specified in
backup-volfile-serversoption are used as volfile servers to mount the client until the mount is successful.Note
This option was earlier specified asbackupvolfile-serverwhich is no longer valid. - log-level
- Logs only specified level or higher severity messages in the log-file.
- log-file
- Logs the messages in the specified file.
- ro
- Mounts the file system as read only.
- acl
- Enables POSIX Access Control List on mount.
- background-qlen=n
- Enables FUSE to handle n number of requests to be queued before subsequent requests are denied. Default value of n is 64.
- enable-ino32
- this option enables file system to present 32-bit inodes instead of 64- bit inodes.
9.2.3.2. Mounting Volumes Manually
Manually Mount a Red Hat Storage Volume
mount -t glusterfs HOSTNAME|IPADDRESS:/VOLNAME /MOUNTDIR command to manually mount a Red Hat Storage volume.
Note
- If a mount point has not yet been created for the volume, run the
mkdircommand to create a mount point.# mkdir /mnt/glusterfs
- Run the
mount -t glusterfscommand, using the key in the task summary as a guide.# mount -t glusterfs server1:/test-volume /mnt/glusterfs
9.2.3.3. Mounting Volumes Automatically
Mounting a Volume Automatically
- Open the
/etc/fstabfile in a text editor. - Append the following configuration to the
fstabfile.HOSTNAME|IPADDRESS:/VOLNAME /MOUNTDIR glusterfs defaults,_netdev 0 0
Using the example server names, the entry contains the following replaced values.server1:/test-volume /mnt/glusterfs glusterfs defaults,_netdev 0 0
9.2.3.4. Testing Mounted Volumes
Testing Mounted Red Hat Storage Volumes
Prerequisites
- Run the
mountcommand to check whether the volume was successfully mounted.# mount server1:/test-volume on /mnt/glusterfs type fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
- Run the
dfcommand to display the aggregated storage space from all the bricks in a volume.# df -h /mnt/glusterfs Filesystem Size Used Avail Use% Mounted on server1:/test-volume 28T 22T 5.4T 82% /mnt/glusterfs
- Move to the mount directory using the
cdcommand, and list the contents.# cd /mnt/glusterfs # ls
9.3. NFS
getfacl and setfacl operations on NFS clients. The following options are provided to configure the Access Control Lists (ACL) in the glusterFS NFS server with the nfs.acl option. For example:
- To set nfs.acl
ON, run the following command:# gluster volume set <volname> nfs.acl on - To set nfs.acl
OFF, run the following command:# gluster volume set <volname> nfs.acl off
Note
ON by default.
9.3.1. Using NFS to Mount Red Hat Storage Volumes
Note
nfsmount.conf file at /etc/nfsmount.conf by adding the following text in the file:
Defaultvers=3
vers=3 manually in all the mount commands.
# mount nfsserver:export -o vers=3 /MOUNTPOINT
9.3.1.1. Manually Mounting Volumes Using NFS
Manually Mount a Red Hat Storage Volume using NFS
mount command to manually mount a Red Hat Storage volume using NFS.
- If a mount point has not yet been created for the volume, run the
mkdircommand to create a mount point.# mkdir /mnt/glusterfs
- Run the correct
mountcommand for the system.- For Linux
# mount -t nfs -o vers=3 server1:/test-volume /mnt/glusterfs
- For Solaris
# mount -o vers=3 nfs://server1:38467/test-volume /mnt/glusterfs
Manually Mount a Red Hat Storage Volume using NFS over TCP
mount command to manually mount a Red Hat Storage volume using NFS over TCP.
Note
requested NFS version or transport protocol is not supported
nfs.mount-udp is supported for mounting a volume, by default it is disabled. The following are the limitations:
- If
nfs.mount-udpis enabled, the MOUNT protocol needed for NFSv3 can handle requests from NFS-clients that require MOUNT over UDP. This is useful for at least some versions of Solaris, IBM AIX and HP-UX. - Currently, MOUNT over UDP does not have support for mounting subdirectories on a volume. Mounting
server:/volume/subdirexports is only functional when MOUNT over TCP is used. - MOUNT over UDP does currently not have support for different authentication options that MOUNT over TCP honours. Enabling
nfs.mount-udpmay give more permissions to NFS clients than intended via various authentication options likenfs.rpc-auth-allow,nfs.rpc-auth-rejectandnfs.export-dir.
- If a mount point has not yet been created for the volume, run the
mkdircommand to create a mount point.# mkdir /mnt/glusterfs
- Run the correct
mountcommand for the system, specifying the TCP protocol option for the system.- For Linux
# mount -t nfs -o vers=3,mountproto=tcp server1:/test-volume /mnt/glusterfs
- For Solaris
# mount -o proto=tcp, nfs://server1:38467/test-volume /mnt/glusterfs
9.3.1.2. Automatically Mounting Volumes Using NFS
Note
/etc/auto.master and /etc/auto.misc files, and restart the autofs service. Whenever a user or process attempts to access the directory it will be mounted in the background on-demand.
Mounting a Volume Automatically using NFS
- Open the
/etc/fstabfile in a text editor. - Append the following configuration to the
fstabfile.HOSTNAME|IPADDRESS:/VOLNAME /MOUNTDIR glusterfs mountdir nfs defaults,_netdev, 0 0
Using the example server names, the entry contains the following replaced values.server1:/test-volume /mnt/glusterfs nfs defaults,_netdev, 0 0
Mounting a Volume Automatically using NFS over TCP
- Open the
/etc/fstabfile in a text editor. - Append the following configuration to the
fstabfile.HOSTNAME|IPADDRESS:/VOLNAME /MOUNTDIR glusterfs nfs defaults,_netdev,mountproto=tcp 0 0
Using the example server names, the entry contains the following replaced values.server1:/test-volume /mnt/glusterfs nfs defaults,_netdev,mountproto=tcp 0 0
9.3.1.3. Authentication Support for Subdirectory Mount
nfs.export-dir option to provide client authentication during sub-directory mount. The nfs.export-dir and nfs.export-dirs options provide granular control to restrict or allow specific clients to mount a sub-directory. These clients can be authenticated with either an IP, host name or a Classless Inter-Domain Routing (CIDR) range.
- nfs.export-dirs: By default, all NFS sub-volumes are exported as individual exports. This option allows you to manage this behaviour. When this option is turned off, none of the sub-volumes are exported and hence the sub-directories cannot be mounted. This option is on by default.To set this option to off, run the following command:
# gluster volume set <volname> nfs.export-dirs offTo set this option to on, run the following command:# gluster volume set <volname> nfs.export-dirs on - nfs.export-dir: This option allows you to export specified subdirectories on the volume. You can export a particular subdirectory, for example:
# gluster volume set <volname> nfs.export-dir /d1,/d2/d3/d4,/d6where d1, d2, d3, d4, d6 are the sub-directories.You can also control the access to mount these subdirectories based on the IP address, host name or a CIDR. For example:# gluster volume set <volname> nfs.export-dir "/d1(<ip address>),/d2/d3/d4(<host name>|<ip address>),/d6(<CIDR>)"The directory /d1, /d2 and /d6 are directories inside the volume. Volume name must not be added to the path. For example if the volume vol1 has directories d1 and d2, then to export these directories use the following command:#gluster volume set vol1 nfs.export-dir "/d1(192.0.2.2),d2(192.0.2.34)"
9.3.1.4. Testing Volumes Mounted Using NFS
Testing Mounted Red Hat Storage Volumes
Prerequisites
- Run the
mountcommand to check whether the volume was successfully mounted.# mount server1:/test-volume on /mnt/glusterfs type nfs (rw,addr=server1)
- Run the
dfcommand to display the aggregated storage space from all the bricks in a volume.# df -h /mnt/glusterfs Filesystem Size Used Avail Use% Mounted on server1:/test-volume 28T 22T 5.4T 82% /mnt/glusterfs
- Move to the mount directory using the
cdcommand, and list the contents.# cd /mnt/glusterfs # ls
9.3.2. Troubleshooting NFS
- Q: The mount command on the NFS client fails with RPC Error: Program not registered. This error is encountered due to one of the following reasons:
- Q: The rpcbind service is not running on the NFS client. This could be due to the following reasons:
- Q: The NFS server glusterfsd starts but the initialization fails with nfsrpc- service: portmap registration of program failed error message in the log.
- Q: The NFS server start-up fails with the message Port is already in use in the log file.
- Q: The mount command fails with with NFS server failed error:
- Q: The showmount command fails with clnt_create: RPC: Unable to receive error. This error is encountered due to the following reasons:
- Q: The application fails with Invalid argument or Value too large for defined data type
- Q: After the machine that is running NFS server is restarted the client fails to reclaim the locks held earlier.
- Q: The rpc actor failed to complete successfully error is displayed in the nfs.log, even after the volume is mounted successfully.
- Q: The mount command fails with No such file or directory.
RPC Error: Program not registered. This error is encountered due to one of the following reasons:
- The NFS server is not running. You can check the status using the following command:
#gluster volume status - The volume is not started. You can check the status using the following command:
#gluster volume info - rpcbind is restarted. To check if rpcbind is running, execute the following command:
#ps ax| grep rpcbind
- If the NFS server is not running, then restart the NFS server using the following command:
#gluster volume start <volname> - If the volume is not started, then start the volume using the following command:
#gluster volume start <volname> - If both rpcbind and NFS server is running then restart the NFS server using the following commands:
#gluster volume stop <volname>#gluster volume start <volname>
rpcbind service is not running on the NFS client. This could be due to the following reasons:
- The portmap is not running.
- Another instance of kernel NFS server or glusterNFS server is running.
rpcbind service by running the following command:
# service rpcbind start
[2010-05-26 23:33:47] E [rpcsvc.c:2598:rpcsvc_program_register_portmap] rpc-service: Could notregister with portmap [2010-05-26 23:33:47] E [rpcsvc.c:2682:rpcsvc_program_register] rpc-service: portmap registration of program failed [2010-05-26 23:33:47] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465 [2010-05-26 23:33:47] E [nfs.c:125:nfs_init_versions] nfs: Program init failed [2010-05-26 23:33:47] C [nfs.c:531:notify] nfs: Failed to initialize protocols [2010-05-26 23:33:49] E [rpcsvc.c:2614:rpcsvc_program_unregister_portmap] rpc-service: Could not unregister with portmap [2010-05-26 23:33:49] E [rpcsvc.c:2731:rpcsvc_program_unregister] rpc-service: portmap unregistration of program failed [2010-05-26 23:33:49] E [rpcsvc.c:2744:rpcsvc_program_unregister] rpc-service: Program unregistration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465
- Start the rpcbind service on the NFS server by running the following command:
# service rpcbind start
After starting rpcbind service, glusterFS NFS server needs to be restarted. - Stop another NFS server running on the same machine.Such an error is also seen when there is another NFS server running on the same machine but it is not the glusterFS NFS server. On Linux systems, this could be the kernel NFS server. Resolution involves stopping the other NFS server or not running the glusterFS NFS server on the machine. Before stopping the kernel NFS server, ensure that no critical service depends on access to that NFS server's exports.On Linux, kernel NFS servers can be stopped by using either of the following commands depending on the distribution in use:
# service nfs-kernel-server stop # service nfs stop
- Restart glusterFS NFS server.
[2010-05-26 23:40:49] E [rpc-socket.c:126:rpcsvc_socket_listen] rpc-socket: binding socket failed:Address already in use [2010-05-26 23:40:49] E [rpc-socket.c:129:rpcsvc_socket_listen] rpc-socket: Port is already in use [2010-05-26 23:40:49] E [rpcsvc.c:2636:rpcsvc_stage_program_register] rpc-service: could not create listening connection [2010-05-26 23:40:49] E [rpcsvc.c:2675:rpcsvc_program_register] rpc-service: stage registration of program failed [2010-05-26 23:40:49] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465 [2010-05-26 23:40:49] E [nfs.c:125:nfs_init_versions] nfs: Program init failed [2010-05-26 23:40:49] C [nfs.c:531:notify] nfs: Failed to initialize protocols
mount: mount to NFS server '10.1.10.11' failed: timed out (retrying).
- Disable name lookup requests from NFS server to a DNS server.The NFS server attempts to authenticate NFS clients by performing a reverse DNS lookup to match host names in the volume file with the client IP addresses. There can be a situation where the NFS server either is not able to connect to the DNS server or the DNS server is taking too long to respond to DNS request. These delays can result in delayed replies from the NFS server to the NFS client resulting in the timeout error.NFS server provides a work-around that disables DNS requests, instead relying only on the client IP addresses for authentication. The following option can be added for successful mounting in such situations:
option nfs.addr.namelookup off
Note
Remember that disabling the NFS server forces authentication of clients to use only IP addresses. If the authentication rules in the volume file use host names, those authentication rules will fail and client mounting will fail. - NFS version used by the NFS client is other than version 3 by default.glusterFS NFS server supports version 3 of NFS protocol by default. In recent Linux kernels, the default NFS version has been changed from 3 to 4. It is possible that the client machine is unable to connect to the glusterFS NFS server because it is using version 4 messages which are not understood by glusterFS NFS server. The timeout can be resolved by forcing the NFS client to use version 3. The vers option to mount command is used for this purpose:
# mount nfsserver:export -o vers=3 /MOUNTPOINT
- The firewall might have blocked the port.
- rpcbind might not be running.
NFS.enable-ino32 <on | off>
off by default, which permits NFS to return 64-bit inode numbers by default.
- built and run on 32-bit machines, which do not support large files by default,
- built to 32-bit standards on 64-bit systems.
-D_FILE_OFFSET_BITS=64
chkconfig --list nfslock to check if NSM is configured during OS boot.
on,run chkconfig nfslock off to disable NSM clients during boot, which resolves the issue.
rpc actor failed to complete successfully error is displayed in the nfs.log, even after the volume is mounted successfully.
nfs.log file.
[2013-06-25 00:03:38.160547] W [rpcsvc.c:180:rpcsvc_program_actor] 0-rpc-service: RPC program version not available (req 100003 4) [2013-06-25 00:03:38.160669] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
noacl option in the mount command as follows:
mount -t nfs -o vers=3,noacl server1:/test-volume /mnt/glusterfs
9.3.3. NFS Ganesha
Important
- nfs-ganesha is a technology preview feature. Technology preview features are not fully supported under Red Hat subscription level agreements (SLAs), may not be functionally complete, and are not intended for production use. However, these features provide early access to upcoming product innovations, enabling customers to test functionality and provide feedback during the development process. As Red Hat considers making future iterations of technology preview features generally available, we will provide commercially reasonable support to resolve any reported issues that customers experience when using these features.
- Red Hat Storage currently does not support NFSv4 delegations, Multi-head NFS and High Availability. These will be added in the upcoming releases of Red Hat Storage nfs-ganesha. It is not a feature recommended for production deployment in its current form. However, Red Hat Storage volumes can be exported via nfs-ganesha for consumption by both NFSv3 and NFSv4 clients.
9.3.3.1. Installing nfs-ganesha
- Installing nfs-ganesha using yum
- Installing nfs-ganesha during an ISO Installation
- Installing nfs-ganesha using RHN / Red Hat Satellite
9.3.3.1.1. Installing using yum
# yum install nfs-ganesha
# rpm -qlp nfs-ganesha-2.1.0.2-4.el6rhs.x86_64.rpm /etc/glusterfs-ganesha/README /etc/glusterfs-ganesha/nfs-ganesha.conf /etc/glusterfs-ganesha/org.ganesha.nfsd.conf /usr/bin/ganesha.nfsd /usr/lib64/ganesha /usr/lib64/ganesha/libfsalgluster.so /usr/lib64/ganesha/libfsalgluster.so.4 /usr/lib64/ganesha/libfsalgluster.so.4.2.0 /usr/lib64/ganesha/libfsalgpfs.so /usr/lib64/ganesha/libfsalgpfs.so.4 /usr/lib64/ganesha/libfsalgpfs.so.4.2.0 /usr/lib64/ganesha/libfsalnull.so /usr/lib64/ganesha/libfsalnull.so.4 /usr/lib64/ganesha/libfsalnull.so.4.2.0 /usr/lib64/ganesha/libfsalproxy.so /usr/lib64/ganesha/libfsalproxy.so.4 /usr/lib64/ganesha/libfsalproxy.so.4.2.0 /usr/lib64/ganesha/libfsalvfs.so /usr/lib64/ganesha/libfsalvfs.so.4 /usr/lib64/ganesha/libfsalvfs.so.4.2.0 /usr/share/doc/nfs-ganesha /usr/share/doc/nfs-ganesha/ChangeLog /usr/share/doc/nfs-ganesha/LICENSE.txt
/usr/bin/ganesha.nfsd is the nfs-ganesha daemon.
9.3.3.1.2. Installing nfs-ganesha During an ISO Installation
- When installing Red Hat Storage using the ISO, in the Customizing the Software Selection screen, select Red Hat Storage Tools Group and click Optional Packages.
- From the list of packages, select
nfs-ganeshaand click Close. - Proceed with the remaining installation steps for Red Hat Storage. For more information refer to Installing from an ISO Image in the Red Hat Storage 3.0 Installation Guide.
9.3.3.1.3. Installing from Red Hat Satellite Server or Red Hat Network
- Install nfs-ganesha by executing the following command:
# yum install nfs-ganesha - Verify the installation by running the following command:
# yum list nfs-ganesha Installed Packages nfs-ganesha.x86_64 2.1.0.2-4.el6rhs rhs-3-for-rhel-6-server-rpms
9.3.3.2. Pre-requisites to run nfs-ganesha
Note
- Red Hat does not recommend running nfs-ganesha in mixed-mode and/or hybrid environments. This includes multi-protocol environments where NFS and CIFS shares are used simultaneously, or running nfs-ganesha together with gluster-nfs, kernel-nfs or gluster-fuse clients.
- Only one of nfs-ganesha, gluster-nfs server or kernel-nfs can be enabled on a given machine/host as all NFS implementations use the port 2049 and only one can be active at a given time. Hence you must disable gluster-nfs (it is enabled by default on a volume) and kernel-nfs before nfs-ganesha is started.
- A Red Hat Storage volume should be available for export and nfs-ganesha rpms have to be installed.
- IPv6 should be enabled on the host interface that is used by the nfs-ganesha daemon. To enable IPv6 support, perform the following steps:
- Comment or remove the line
options ipv6 disable=1in the/etc/modprobe.d/ipv6.conffile. - Reboot the system.
9.3.3.3. Exporting and Unexporting Volumes through nfs-ganesha
- Copy the
org.ganesha.nfsd.conffile into the/etc/dbus-1/system.d/directory. Theorg.ganesha.nfsd.conffile can be found in/etc/glusterfs-ganesha/on installation of nfs-ganesha rpms. - Execute the following command:
service messagebus restart
Note
- Disable gluster-nfs on all Red Hat Storage volumes.
# gluster volume set volname nfs.disable on
gluster-nfs and nfs-ganesha cannot run simultaneously. Hence, gluster-nfs must be disabled on all Red Hat Storage volumes before exporting them via nfs-ganesha. - To set the host IP, execute the following command:
# gluster vol set volname nfs-ganesha.host IP
This command sets the host IP to start nfs-ganesha.In a multi-node volume environment, it is recommended that all the nfs-ganesha related commands/operations must be run on one of the nodes only. Hence, the IP address provided must be the IP of that node. If a Red Hat Storage volume is already exported, setting a different host IP will take immediate effect. - To start nfs-ganesha, execute the following command:
# gluster volume set volname nfs-ganesha.enable on
# gluster vol set volname nfs-ganesha.enable off
# gluster vol set volname nfs-ganesha.enable off
- To set the host IP, execute the following command:
# gluster vol set volname nfs-ganesha.host IP
- To restart nfs-ganesha, execute the following command:
# gluster volume set volname nfs-ganesha.enable on
- Check if nfs-ganesha is started by executing the following command:
ps aux | grep ganesha
- Check if the volume is exported.
showmount -e localhost
- The logs of ganesha.nfsd daemon are written to
/tmp/ganesha.log. Check the log file on noticing any unexpected behaviour. This file will be lost in case of a system reboot.
9.3.3.4. Supported Features of nfs-ganesha
Note
Note
9.3.3.5. Manually Configuring nfs-ganesha Exports
# /usr/bin/ganesha.nfsd -f location of nfs-ganesha.conf file -L location of log file -N log level -d
/usr/local/bin/ganesha.nfsd -f nfs-ganesha.conf -L nfs-ganesha.log -N NIV_DEBUG -d
nfs-ganesha.confis the configuration file that is available by default on installation of nfs-ganesha rpms. This file is located at/etc/glusterfs-ganesha.nfs-ganesha.logis the log file for the ganesha.nfsd process.- NIV_DEBUG is the log level.
EXPORT block into a .conf file, for example export.conf. Edit the parameters appropriately and include the export.conf file in nfs-ganesha.conf. This can be done by adding the line below at the end of nfs-ganesha.conf.
%include "export.conf"
# cat export.conf
EXPORT{
Export_Id = 1 ; # Export ID unique to each export
Path = "volume_path"; # Path of the volume to be exported. Eg: "/test_volume"
FSAL {
name = GLUSTER;
hostname = "10.xx.xx.xx"; # IP of one of the nodes in the trusted pool
volume = "volume_name"; # Volume name. Eg: "test_volume"
}
Access_type = RW; # Access permissions
Squash = No_root_squash; # To enable/disable root squashing
Disable_ACL = TRUE; # To enable/disable ACL
Pseudo = "pseudo_path"; # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
Protocols = "3,4" ; # NFS protocols supported
Transports = "UDP,TCP" ; # Transport protocols supported
SecType = "sys"; # Security flavors supported
}export.conf file to see the expected behaviour.
export.conf file.
Path = "path_to_subdirectory"; # Path of the volume to be exported. Eg: "/test_volume/test_subdir"
FSAL {
name = GLUSTER;
hostname = "10.xx.xx.xx"; # IP of one of the nodes in the trusted pool
volume = "volume_name"; # Volume name. Eg: "test_volume"
volpath = "path_to_subdirectory_with_respect_to_volume"; #Subdirectory path from the root of the volume. Eg: "/test_subdir"
}# cat export.conf
EXPORT{
Export_Id = 1 ; # Export ID unique to each export
Path = "test_volume"; # Path of the volume to be exported. Eg: "/test_volume"
FSAL {
name = GLUSTER;
hostname = "10.xx.xx.xx"; # IP of one of the nodes in the trusted pool
volume = "test_volume"; # Volume name. Eg: "test_volume"
}
Access_type = RW; # Access permissions
Squash = No_root_squash; # To enable/disable root squashing
Disable_ACL = TRUE; # To enable/disable ACL
Pseudo = "/test_volume"; # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
Protocols = "3,4" ; # NFS protocols supported
Transports = "UDP,TCP" ; # Transport protocols supported
SecType = "sys"; # Security flavors supported
}
EXPORT{
Export_Id = 2 ; # Export ID unique to each export
Path = "test_volume/test_subdir"; # Path of the volume to be exported. Eg: "/test_volume"
FSAL {
name = GLUSTER;
hostname = "10.xx.xx.xx"; # IP of one of the nodes in the trusted pool
volume = "test_volume"; # Volume name. Eg: "test_volume"
volpath = "/test_subdir"
}
Access_type = RW; # Access permissions
Squash = No_root_squash; # To enable/disable root squashing
Disable_ACL = "FALSE; # To enable/disable ACL
Pseudo = "/test_subdir"; # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
Protocols = "3,4" ; # NFS protocols supported
Transports = "UDP,TCP" ; # Transport protocols supported
SecType = "sys"; # Security flavors supported
}
#showmount -e localhost
Export list for localhost:
/test_volume (everyone)
/test_volume/test_subdir (everyone)
/ (everyone)EXPORT block applies to any client that mounts the exported volume. To provide specific permissions to specific clients , introduce a client block inside the EXPORT block.
EXPORT block.
client {
clients = "10.xx.xx.xx"; # IP of the client.
allow_root_access = true;
access_type = "RO"; # Read-only permissions
Protocols = "3"; # Allow only NFSv3 protocol.
anonymous_uid = 1440;
anonymous_gid = 72;
}client block.
Disable_ACL = FALSE;
Pseudo = "pseudo_path"; # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
org.ganesha.nfsd.conf is installed in /etc/glusterfs-ganesha/ as part of the nfs-ganesha rpms. To export entries dynamically without restarting nfs-ganesha, execute the following steps:
- Copy the file
org.ganesha.nfsd.confinto the directory/etc/dbus-1/system.d/. - Execute the following command:
service messagebus restart
- Adding an export dynamicallyTo add an export dynamically, add an export block as explained in section Exporting Multiple Entries, and execute the following command:
dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/path-to-export.conf string:'EXPORT(Path=/path-in-export-block)'
For example, to add testvol1 dynamically:dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.AddExport string:/home/nfs-ganesha/export.conf string:'EXPORT(Path=/testvol1)') method return sender=:1.35 -> dest=:1.37 reply_serial=2
- Removing an export dynamicallyTo remove an export dynamically, execute the following command:
dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.RemoveExport int32:export-id-in-the-export-block
For example:dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.RemoveExport int32:79 method return sender=:1.35 -> dest=:1.37 reply_serial=2
9.3.3.6. Accessing nfs-ganesha Exports
mount -t nfs -o vers=3 ip:/volname /mountpoint
mount -t nfs -o vers=3 10.70.0.0:/testvol /mnt
mount -t nfs -o vers=4 ip:/volname /mountpoint
mount -t nfs -o vers=4 10.70.0.0:/testvol /mnt
9.3.3.7. Troubleshooting
- Situationnfs-ganesha fails to start.SolutionFollow the listed steps to fix the issue:
- Review the
/tmp/ganesha.logto understand the cause of failure. - Ensure the kernel and gluster nfs services are inactive.
- Ensure you execute both the
nfs-ganesha.hostandnfs-ganesha.enablevolume set options.
For more information see, Section 7.3.3.5 Manually Configuring nfs-ganesha Exports. - Situationnfs-ganesha has started and fails to export a volume.SolutionFollow the listed steps to fix the issue:
- Ensure the file
org.ganesha.nfsd.confis copied into/etc/dbus-1/systemd/before starting nfs-ganesha. - In case you had not copied the file, restart nfs-ganesha. For more information see, Section 7.3.3.3 Exporting and Unexporting Volumes through nfs-ganesha
- Situationnfs-ganesha fails to stopSolutionExecute the following steps
- Check for the status of the nfs-ganesha process.
- If it is still running, issue a kill -9 signal on its PID.
- Run the following command to check if nfs, mountd, rquotad, nlockmgr and rquotad services are unregistered cleanly.
rpcinfo -p- If the services are not unregistered, then delete these entries using the following command:
rpcinfo -dNote
You can also restart the rpcbind service instead of using rpcinfo -d on individual entries.
- Force start the volume by using the following command:
# gluster volume start <volname> force
- SituationPermission issues.SolutionBy default, the
root squashoption is disabled when you start nfs-ganesha using the CLI. In case, you encounter any permission issues, check the unix permissions of the exported entry.
9.4. SMB
Note
9.4.1. Automatic and Manual Exporting
SMB Prerequisites
- Run
gluster volume set VOLNAME stat-prefetch offto disable stat-prefetch for the volume. - Run
gluster volume set VOLNAME server.allow-insecure onto permit insecure ports.Note
This allows Samba to communicate with brick processes even with untrusted ports. - Edit the
/etc/glusterfs/glusterd.volin each Red Hat Storage node, and add the following setting:option rpc-auth-allow-insecure on
Note
This allows Samba to communicate with glusterd even with untrusted ports. - Restart
glusterdservice on each Red Hat Server node. - Run the following command to ensure proper lock and I/O coherency.
gluster volume set <VOLNAME> storage.batch-fsync-delay-usec 0
Automatically Exporting Red Hat Storage Volumes Through Samba
gluster volume start VOLNAME command, the volume is automatically exported through Samba on all Red Hat Storage servers running Samba. Disabling volume mounting through Samba requires changes to the s30samba-start.sh script.
- With elevated privileges, navigate to
/var/lib/glusterd/hooks/1/start/post - Rename the
S30samba-start.shtoK30samba-start.sh.For more information about these scripts, see Section 15.2, “Prepackaged Scripts”. - Run
# smbstatus -Son the server to display the status of the volume:Service pid machine Connected at ------------------------------------------------------------------- gluster-<VOLNAME> 11967 __ffff_192.168.1.60 Mon Aug 6 02:23:25 2012
Note
#chkconfig smb on
Manually Exporting Red Hat Storage Volumes Through Samba
Note
- Open the
/etc/samba/smb.conffile in a text editor and add the following lines for a simple configuration:[gluster-<VOLNAME>] comment = For samba share of volume VOLNAME vfs objects = glusterfs glusterfs:volume = VOLNAME glusterfs:logfile = /var/log/samba/VOLNAME.log glusterfs:loglevel = 7 path = / read only = no guest ok = yes
The configuration options are described in the following table:Table 9.7. Configuration Options
Configuration Options Required? Default Value Description Path Yes n/a It represents the path that is relative to the root of the gluster volume that is being shared. Hence /represents the root of the gluster volume. Exporting a subdirectory of a volume is supported and /subdir in path exports only that subdirectory of the volume.glusterfs:volumeYes n/a The volume name that is shared. glusterfs:logfileNo NULL Path to the log file that will be used by the gluster modules that are loaded by the vfs plugin. Standard Samba variable substitutions as mentioned in smb.confare supported.glusterfs:loglevelNo 7 This option is equivalent to the client-log-leveloption of gluster. 7 is the default value and corresponds to the INFO level.glusterfs:volfile_serverNo localhost The gluster server to be contacted to fetch the volfile for the volume. - Run
service smb [re]startto start or restart thesmbservice. - Run
smbpasswdto set the SMB password.# smbpasswd -a username
Specify the SMB password. This password is used during the SMB mount.
Note
#chkconfig smb on
9.4.2. Mounting Volumes using SMB
- Add the user on the all the Samba servers based on your configuration:
# adduser <username> - Add the user to the list of Samba users on all Samba servers and assign password by executing the following command:
# smbpasswd -a <username> - Perform a FUSE mount of the gluster volume on any one of the Samba servers and provide required permissions to the user by executing the following commands:
# mount -t glusterfs -oacl <ip address>:<volname> <mountpoint># setfacl -muser:<username>:rwx <mountpoint>
9.4.2.1. Manually Mounting Volumes Using SMB on Red Hat Enterprise Linux and Windows
Mounting a Volume Manually using SMB on Red Hat Enterprise Linux
- Install the
cifs-utilspackage on the client.# yum install cifs-utils
- Run
mount -t cifsto mount the exported SMB share, using the syntax example as guidance.Example 9.1. mount -t cifs Command Syntax
# mount -t cifs \\\\Samba_Server_IP_Address\\Share_Name Mount_Point -o user=<username>, pass=<password>
Run# mount -t cifs \\\\SAMBA_SERVER_IP\\gluster-VOLNAME /mnt/smb -o user=<username>,pass=<password>for a Red Hat Storage volume exported through SMB, which uses the/etc/samba/smb.conffile with the following configuration.[gluster-<VOLNAME>] comment = For samba share of volume VOLNAME vfs objects = glusterfs glusterfs:volume = VOLNAME glusterfs:logfile = /var/log/samba/VOLNAME.log glusterfs:loglevel = 7 path = / read only = no guest ok = yes
- Run
# smbstatus -Son the server to display the status of the volume:Service pid machine Connected at ------------------------------------------------------------------- gluster-<VOLNAME> 11967 __ffff_192.168.1.60 Mon Aug 6 02:23:25 2012
Mounting a Volume Manually using SMB through Microsoft Windows Explorer
- In Windows Explorer, click → . to open the Map Network Drive screen.
- Choose the drive letter using the drop-down list.
- In the Folder text box, specify the path of the server and the shared resource in the following format: \\SERVER_NAME\VOLNAME.
- Click to complete the process, and display the network drive in Windows Explorer.
- If the Windows Security screen pops up, enter the username and password and click OK.
- Navigate to the network drive to verify it has mounted correctly.
Mounting a Volume Manually using SMB on Microsoft Windows Command-line.
- Click → , and then type
cmd. - Enter
net use z: \\SERVER_NAME\VOLNAME<password> /USER:<username>where z: is the drive letter to assign to the shared volume.For example,net use y: \\server1\test-volumetest-password /USER:test-user - Navigate to the network drive to verify it has mounted correctly.
9.4.2.2. Automatically Mounting Volumes Using SMB on Red Hat Enterprise Linux and Windows
Mounting a Volume Automatically using SMB on Red Hat Enterprise Linux
- Open the
/etc/fstabfile in a text editor. - Append the following configuration to the
fstabfile.You must specify the filename and its path that contains the user name and/or password in thecredentialsoption in/etc/fstabfile. See themount.cifsman page for more information.\\HOSTNAME|IPADDRESS\SHARE_NAME MOUNTDIR
Using the example server names, the entry contains the following replaced values.\\server1\test-volume /mnt/glusterfs cifs credentials=/etc/samba/passwd,_netdev 0 0
- Run
# smbstatus -Son the server to display the status of the volume:Service pid machine Connected at ------------------------------------------------------------------- gluster-<VOLNAME> 11967 __ffff_192.168.1.60 Mon Aug 6 02:23:25 2012
Mounting a Volume Automatically on Server Start using SMB through Microsoft Windows Explorer
- In Windows Explorer, click → . to open the Map Network Drive screen.
- Choose the drive letter using the drop-down list.
- In the Folder text box, specify the path of the server and the shared resource in the following format: \\SERVER_NAME\VOLNAME.
- Click the Reconnect at logon check box.
- Click to complete the process, and display the network drive in Windows Explorer.
- Navigate to the network drive to verify it has mounted correctly.
9.5. Configuring Automated IP Failover for NFS and SMB
Note
- Amazon Elastic Compute Cloud (EC2) does not support VIPs and is therefore not compatible with this solution.
9.5.1. Setting Up CTDB
Configuring CTDB on Red Hat Storage Server
Note
- If you already have an older version of CTDB, then remove CTDB by executing the following command:
# yum remove ctdb
After removing the older version, proceed with installing the latest CTDB. - Install CTDB on all the nodes that are used as Samba servers to the latest version using the following command:
# yum install ctdb2.5
- In a CTDB based high availability environment of NFS and SMB, the locks will not be migrated on failover.
- You must ensure to open Port 4379 between the Red Hat Storage servers.
- Create a replicate volume. The bricks must be on different machines. This volume will host the lock file, hence choose the brick size accordingly. To create a replicate volume run the following command:
# gluster volume create <volname> replica <n> <ipaddress>:/<brick path>.......N times
where,N: The number of nodes that are used as Samba servers.For example:# gluster volume create ctdb replica 4 10.16.157.75:/rhs/brick1/ctdb/b1 10.16.157.78:/rhs/brick1/ctdb/b2 10.16.157.81:/rhs/brick1/ctdb/b3 10.16.157.84:/rhs/brick1/ctdb/b4
- In the following files, replace all in the statement META=all to the newly created volume name
/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh.
For example:META=all to META=ctdb
- Start the volume.The
S29CTDBsetup.shscript runs on all Red Hat Storage servers and adds the following lines to the[global]section of your Samba configuration file at/etc/samba/smb.conf.clustering = yes idmap backend = tdb2
The script stops Samba server, modifies Samba configuration, adds an entry in/etc/fstab/for the mount, and mounts the volume at/gluster/lockon all the nodes with Samba server. It also enables automatic start of CTDB service on reboot.Note
When you stop a volume,S29CTDB-teardown.shscript runs on all Red Hat Storage servers and removes the following lines from[global]section of your Samba configuration file at/etc/samba/smb.conf.clustering = yes idmap backend = tdb2
It also removes an entry in/etc/fstab/for the mount and unmount the volume at/gluster/lock. - Verify if the file
/etc/sysconfig/ctdbexists on all the nodes that is used as Samba server. This file contains Red Hat Storage recommended CTDB configurations. - Create
/etc/ctdb/nodesfile on all the nodes that is used as Samba servers and add the IPs of these nodes to the file.10.16.157.0 10.16.157.3 10.16.157.6 10.16.157.9
The IPs listed here are the private IPs of Samba servers. - On all the nodes that are used as Samba server which require IP failover, create
/etc/ctdb/public_addressesfile and add the virtual IPs that CTDB should create to this file. Add these IP address in the following format:<Virtual IP>/<routing prefix><node interface>
For example:192.168.1.20/24 eth0 192.168.1.21/24 eth0
9.5.2. Starting and Verifying your Configuration
Start CTDB and Verify the Configuration
- Run
# service ctdb startto start the CTDB service. - Run
# chkconfig smb offto prevent CTDB starting Samba automatically when the server is restarted. - Verify that CTDB is running using the following commands:
# ctdb status # ctdb ip # ctdb ping -n all
- Mount a Red Hat Storage volume using any one of the VIPs.
- Run
# ctdb ipto locate the physical server serving the VIP. - Shut down the CTDB VIP server to verify successful configuration.When the Red Hat Storage Server serving the VIP is shut down there will be a pause for a few seconds, then I/O will resume.
9.6. POSIX Access Control Lists
John creates a file. He does not allow anyone in the group to access the file, except for another user, Antony (even if there are other users who belong to the group john).
9.6.1. Setting POSIX ACLs
- Per user
- Per group
- Through the effective rights mask
- For users not in the user group for the file
9.6.1.1. Setting Access ACLs
# setfacl –m entry_typefile_name command sets and modifies access ACLs
setfaclentry_type Options
r (read), w (write), and x (execute). Specify the ACL entry_type as described below, separating multiple entry types with commas.
- u:user_name:permissons
- Sets the access ACLs for a user. Specify the user name, or the UID.
- g:group_name:permissions
- Sets the access ACLs for a group. Specify the group name, or the GID.
- m:permission
- Sets the effective rights mask. The mask is the combination of all access permissions of the owning group, and all user and group entries.
- o:permissions
- Sets the access ACLs for users other than the ones in the group for the file.
setfacl command is used, the additional permissions are added to the existing POSIX ACLs or the existing rule is modified.
# setfacl -m u:antony:rw /mnt/gluster/data/testfile
9.6.1.2. Setting Default ACLs
# setfacl -d --set entry_type directory command sets default ACLs for files and directories.
setfaclentry_type Options
r (read), w (write), and x (execute). Specify the ACL entry_type as described below, separating multiple entry types with commas.
- u:user_name:permissons
- Sets the access ACLs for a user. Specify the user name, or the UID.
- g:group_name:permissions
- Sets the access ACLs for a group. Specify the group name, or the GID.
- m:permission
- Sets the effective rights mask. The mask is the combination of all access permissions of the owning group, and all user and group entries.
- o:permissions
- Sets the access ACLs for users other than the ones in the group for the file.
# setfacl -d --set o::r /mnt/gluster/data to set the default ACLs for the /data directory to read-only for users not in the user group,
Note
- A subdirectory inherits the default ACLs of the parent directory both as its default ACLs and as an access ACLs.
- A file inherits the default ACLs as its access ACLs.
9.6.2. Retrieving POSIX ACLs
# getfacl command to view the existing POSIX ACLs for a file or directory.
-
# getfacl path/filename - View the existing access ACLs of the
sample.jpgfile using the following command.# getfacl /mnt/gluster/data/test/sample.jpg # owner: antony # group: antony user::rw- group::rw- other::r--
-
# getfacl directory name - View the default ACLs of the
/docdirectory using the following command.# getfacl /mnt/gluster/data/doc # owner: antony # group: antony user::rw- user:john:r-- group::r-- mask::r-- other::r-- default:user::rwx default:user:antony:rwx default:group::r-x default:mask::rwx default:other::r-x
9.6.3. Removing POSIX ACLs
# setfacl -x ACL entry_type file to remove all permissions for a user, groups, or others.
setfaclentry_type Options
r (read), w (write), and x (execute). Specify the ACL entry_type as described below, separating multiple entry types with commas.
- u:user_name
- Sets the access ACLs for a user. Specify the user name, or the UID.
- g:group_name
- Sets the access ACLs for a group. Specify the group name, or the GID.
- m:permission
- Sets the effective rights mask. The mask is the combination of all access permissions of the owning group, and all user and group entries.
- o:permissions
- Sets the access ACLs for users other than the ones in the group for the file.
antony:
# setfacl -x u:antony /mnt/gluster/data/test-file9.6.4. Samba and ACLs
--with-acl-support option, so no special flags are required when accessing or mounting a Samba share.
Chapter 10. Managing Red Hat Storage Volumes
10.1. Configuring Volume Options
Note
# gluster volume info VOLNAME# gluster volume set VOLNAME OPTION PARAMETER
test-volume:
# gluster volume set test-volume performance.cache-size 256MB Set volume successful
Note
| Option | Value Description | Allowed Values | Default Value |
|---|---|---|---|
| auth.allow | IP addresses or hostnames of the clients which are allowed to access the volume. | Valid hostnames or IP addresses, which includes wild card patterns including *. For example, 192.168.1.*. A list of comma separated addresses is acceptable, but a single hostname must not exceed 256 characters. | * (allow all) |
| auth.reject | IP addresses or hostnames of the clients which are denied access to the volume. | Valid hostnames or IP addresses, which includes wild card patterns including *. For example, 192.168.1.*. A list of comma separated addresses is acceptable, but a single hostname must not exceed 256 characters. | none (reject none) |
| Note
Using auth.allow and auth.reject options, you can control access of only glusterFS FUSE-based clients. Use nfs.rpc-auth-* options for NFS access control.
| |||
| cluster.min-free-disk | Specifies the percentage of disk space that must be kept free. This may be useful for non-uniform bricks. | Percentage of required minimum free disk space. | 10% |
| cluster.op-version | Allows you to set the operating version of the cluster. The op-version number cannot be downgraded and is set for all the volumes. Also the op-version does not appear when you execute the gluster volume info command. | 2 | 30000 | Default value is 2 after an upgrade from RHS 2.1. Value is set to 30000 for a new cluster deployment. |
| cluster.self-heal-daemon | Specifies whether proactive self-healing on replicated volumes is activated. | on | off | on |
| cluster.server-quorum-type | If set to server, this option enables the specified volume to participate in the server-side quorum. For more information on configuring the server-side quorum, see Section 10.9.1.1, “Configuring Server-Side Quorum” | none | server | none |
| cluster.server-quorum-ratio | Sets the quorum percentage for the trusted storage pool. | 0 - 100 | >50% |
| cluster.quorum-type | If set to fixed, this option allows writes to a file only if the number of active bricks in that replica set (to which the file belongs) is greater than or equal to the count specified in the cluster.quorum-count option. If set to auto, this option allows writes to the file only if the percentage of active replicate bricks is more than 50% of the total number of bricks that constitute that replica. If there are only two bricks in the replica group, the first brick must be up and running to allow modifications. | fixed | auto | none |
| cluster.quorum-count | The minimum number of bricks that must be active in a replica-set to allow writes. This option is used in conjunction with cluster.quorum-type =fixed option to specify the number of bricks to be active to participate in quorum. The cluster.quorum-type = auto option will override this value. | 1 - replica-count | 0 |
| diagnostics.brick-log-level | Changes the log-level of the bricks. | INFO | DEBUG | WARNING | ERROR | CRITICAL | NONE | TRACE | info |
| diagnostics.client-log-level | Changes the log-level of the clients. | INFO | DEBUG | WARNING | ERROR | CRITICAL | NONE | TRACE | info |
| diagnostics.brick-sys-log-level | Depending on the value defined for this option, log messages at and above the defined level are generated in the syslog and the brick log files. | INFO | WARNING | ERROR | CRITICAL | CRITICAL |
| diagnostics.client-sys-log-level | Depending on the value defined for this option, log messages at and above the defined level are generated in the syslog and the client log files. | INFO | WARNING | ERROR | CRITICAL | CRITICAL |
| diagnostics.client-log-format | Allows you to configure the log format to log either with a message id or without one on the client. | no-msg-id | with-msg-id | with-msg-id |
| diagnostics.brick-log-format | Allows you to configure the log format to log either with a message id or without one on the brick. | no-msg-id | with-msg-id | with-msg-id |
| diagnostics.brick-log-flush-timeout | The length of time for which the log messages are buffered, before being flushed to the logging infrastructure (gluster or syslog files) on the bricks. | 30 - 300 seconds (30 and 300 included) | 120 seconds |
| diagnostics.brick-log-buf-size | The maximum number of unique log messages that can be suppressed until the timeout or buffer overflow, whichever occurs first on the bricks. | 0 and 20 (0 and 20 included) | 5 |
| diagnostics.client-log-flush-timeout | The length of time for which the log messages are buffered, before being flushed to the logging infrastructure (gluster or syslog files) on the clients. | 30 - 300 seconds (30 and 300 included) | 120 seconds |
| diagnostics.client-log-buf-size | The maximum number of unique log messages that can be suppressed until the timeout or buffer overflow, whichever occurs first on the clients. | 0 and 20 (0 and 20 included) | 5 |
| features.quota-deem-statfs | When this option is set to on, it takes the quota limits into consideration while estimating the filesystem size. The limit will be treated as the total size instead of the actual size of filesystem. | on | off | off |
| features.read-only | Specifies whether to mount the entire volume as read-only for all the clients accessing it. | on | off | off |
| group small-file-perf | This option enables the open-behind and quick-read translators on the volume, and can be done only if all the clients of the volume are using Red Hat Storage 2.1. | NA | - |
| network.ping-timeout | The time the client waits for a response from the server. If a timeout occurs, all resources held by the server on behalf of the client are cleaned up. When the connection is reestablished, all resources need to be reacquired before the client can resume operations on the server. Additionally, locks are acquired and the lock tables are updated. A reconnect is a very expensive operation and must be avoided. | 42 seconds | 42 seconds |
| nfs.acl | Disabling nfs.acl will remove support for the NFSACL sideband protocol. This is enabled by default. | enable | disable | enable |
| nfs.enable-ino32 | For nfs clients or applciatons that do not support 64-bit inode numbers, use this option to make NFS return 32-bit inode numbers instead. Disabled by default, so NFS returns 64-bit inode numbers. | enable | disable | disable |
| nfs.export-dir | By default, all NFS volumes are exported as individual exports. This option allows you to export specified subdirectories on the volume. | The path must be an absolute path. Along with the path allowed, list of IP address or hostname can be associated with each subdirectory. | None |
| nfs.export-dirs | By default, all NFS sub-volumes are exported as individual exports. This option allows any directory on a volume to be exported separately. | on | off | on |
| Note
The value set for nfs.export-dirs and nfs.export-volumes options are global and applies to all the volumes in the Red Hat Storage trusted storage pool.
| |||
| nfs.export-volumes | Enables or disables exporting entire volumes. If disabled and used in conjunction with nfs.export-dir, you can set subdirectories as the only exports. | on | off | on |
| nfs.mount-rmtab | Path to the cache file that contains a list of NFS-clients and the volumes they have mounted. Change the location of this file to a mounted (with glusterfs-fuse, on all storage servers) volume to gain a trusted pool wide view of all NFS-clients that use the volumes. The contents of this file provide the information that can get obtained with the showmount command. | Path to a directory | /var/lib/glusterd/nfs/rmtab |
| nfs.mount-udp | Enable UDP transport for the MOUNT sideband protocol. By default, UDP is not enabled, and MOUNT can only be used over TCP. Some NFS-clients (certain Solaris, HP-UX and others) do not support MOUNT over TCP and enabling nfs.mount-udp makes it possible to use NFS exports provided by Red Hat Storage. | disable | enable | disable |
| nfs.nlm | By default, the Network Lock Manager (NLMv4) is enabled. Use this option to disable NLM. Red Hat does not recommend disabling this option. | on | on|off |
| nfs.rpc-auth-allow IP_ADRESSES | A comma separated list of IP addresses allowed to connect to the server. By default, all clients are allowed. | Comma separated list of IP addresses | accept all |
| nfs.rpc-auth-reject IP_ADRESSES | A comma separated list of addresses not allowed to connect to the server. By default, all connections are allowed. | Comma separated list of IP addresses | reject none |
| nfs.ports-insecure | Allows client connections from unprivileged ports. By default only privileged ports are allowed. This is a global setting for allowing insecure ports for all exports using a single option. | on | off | off |
| nfs.addr-namelookup | Specifies whether to lookup names for incoming client connections. In some configurations, the name server can take too long to reply to DNS queries, resulting in timeouts of mount requests. This option can be used to disable name lookups during address authentication. Note that disabling name lookups will prevent you from using hostnames in nfs.rpc-auth-* options. | on | off | on |
| nfs.port | Associates glusterFS NFS with a non-default port. | 1025-65535 | 38465- 38467 |
| nfs.disable | Specifies whether to disable NFS exports of individual volumes. | on | off | off |
| nfs.server-aux-gids | When enabled, the NFS-server will resolve the groups of the user accessing the volume. NFSv3 is restricted by the RPC protocol (AUTH_UNIX/AUTH_SYS header) to 16 groups. By resolving the groups on the NFS-server, this limits can get by-passed. | on|off | off |
| open-behind | It improves the application's ability to read data from a file by sending success notifications to the application whenever it receives a open call. | on | off | off |
| performance.io-thread-count | The number of threads in the IO threads translator. | 0 - 65 | 16 |
| performance.cache-max-file-size | Sets the maximum file size cached by the io-cache translator. Can be specified using the normal size descriptors of KB, MB, GB, TB, or PB (for example, 6GB). | Size in bytes, or specified using size descriptors. | 2 ^ 64-1 bytes |
| performance.cache-min-file-size | Sets the minimum file size cached by the io-cache translator. Can be specified using the normal size descriptors of KB, MB, GB, TB, or PB (for example, 6GB). | Size in bytes, or specified using size descriptors. | 0 |
| performance.cache-refresh-timeout | The number of seconds cached data for a file will be retained. After this timeout, data re-validation will be performed. | 0 - 61 seconds | 1 second |
| performance.cache-size | Size of the read cache. | Size in bytes, or specified using size descriptors. | 32 MB |
| performance.md-cache-timeout | The time period in seconds which controls when metadata cache has to be refreshed. If the age of cache is greater than this time-period, it is refreshed. Every time cache is refreshed, its age is reset to 0. | 0-60 seconds | 1 second |
| performance.quick-read.priority | Sets the priority of the order in which files get flushed from the cache when it is full. | min/max | - |
| performance.quick-read.max-file-size | Maximum size of the file that can be fetched using the get interface. Files larger than this size are not cached by quick-read. | 0-1MB | 64KB |
| performance.quick-read.cache-timeout | Timeout for the validation of a cached file. After this timeout, the properties of the file is compared with that of the cached copy. If the file has changed after it has been cached, the cache is flushed. | 1-60 seconds | 1 second |
| performance.quick-read.cache-size | The size of the quick-read cache that is used to cache all files. Can be specified using the normal size descriptors of KB, MB, GB. | 0-32GB | 128MB |
| performance.use-anonymous-fd | This option requires open-behind to be on. For read operations, use anonymous FD when the original FD is open-behind and not yet opened in the backend. | Yes | No | Yes |
| performance.lazy-open | This option requires open-behind to be on. Perform an open in the backend only when a necessary FOP arrives (for example, write on the FD, unlink of the file). When this option is disabled, perform backend open immediately after an unwinding open. | Yes/No | Yes |
| server.allow-insecure | Allows client connections from unprivileged ports. By default, only privileged ports are allowed. This is a global setting for allowing insecure ports to be enabled for all exports using a single option. | on | off | off |
| Important
Turning server.allow-insecure to on allows ports to accept/reject messages from insecure ports. Enable this option only if your deployment requires it, for example if there are too many bricks in each volume, or if there are too many services which have already utilized all the privileged ports in the system. You can control access of only glusterFS FUSE-based clients. Use nfs.rpc-auth-* options for NFS access control.
| |||
| server.root-squash | Prevents root users from having root privileges, and instead assigns them the privileges of nfsnobody. This squashes the power of the root users, preventing unauthorized modification of files on the Red Hat Storage Servers. | on | off | off |
| server.anonuid | Value of the UID used for the anonymous user when root-squash is enabled. When root-squash is enabled, all the requests received from the root UID (that is 0) are changed to have the UID of the anonymous user. | 0 - 4294967295 | 65534 (this UID is also known as nfsnobody) |
| server.anongid | Value of the GID used for the anonymous user when root-squash is enabled. When root-squash is enabled, all the requests received from the root GID (that is 0) are changed to have the GID of the anonymous user. | 0 - 4294967295 | 65534 (this UID is also known as nfsnobody) |
| server.gid-timeout | The time period in seconds which controls when cached groups has to expire. This is the cache that contains the groups (GIDs) where a specified user (UID) belongs to. This option is used only when server.manage-gids is enabled. | 0-4294967295 seconds | 2 seconds |
| server.manage-gids | Resolve groups on the server-side. By enabling this option, the groups (GIDs) a user (UID) belongs to gets resolved on the server, instead of using the groups that were send in the RPC Call by the client. This option makes it possible to apply permission checks for users that belong to bigger group lists than the protocol supports (approximately 93). | on|off | off |
| server.statedump-path | Specifies the directory in which the statedump files must be stored. | /var/run/gluster (for a default installation) | Path to a directory |
| storage.health-check-interval | Sets the time interval in seconds for a filesystem health check. You can set it to 0 to disable. The POSIX translator on the bricks performs a periodic health check. If this check fails, the filesystem exported by the brick is not usable anymore and the brick process (glusterfsd) logs a warning and exits. | 0-4294967295 seconds | 30 seconds |
| storage.owner-uid | Sets the UID for the bricks of the volume. This option may be required when some of the applications need the brick to have a specific UID to function correctly. Example: For QEMU integration the UID/GID must be qemu:qemu, that is, 107:107 (107 is the UID and GID of qemu). | Any integer greater than or equal to -1. | The UID of the bricks are not changed. This is denoted by -1. |
| storage.owner-gid | Sets the GID for the bricks of the volume. This option may be required when some of the applications need the brick to have a specific GID to function correctly. Example: For QEMU integration the UID/GID must be qemu:qemu, that is, 107:107 (107 is the UID and GID of qemu). | Any integer greater than or equal to -1. | The GID of the bricks are not changed. This is denoted by -1. |
10.2. Expanding Volumes
Note
Procedure 10.1. Expanding a Volume
- From any server in the trusted storage pool, use the following command to probe the server on which you want to add a new brick :
# gluster peer probe HOSTNAMEFor example:# gluster peer probe server4 Probe successful
- Add the brick using the following command:
# gluster volume add-brick VOLNAME NEW_BRICKFor example:# gluster volume add-brick test-volume server4:/exp4 Add Brick successful
If you want to change the replica/stripe count, you must add the replica/stripe count to theadd-brickcommand.For example,# gluster volume add-brick replica 3 test-volume server4:/exp4
When increasing the replica/stripe count of a distribute replicate/stripe volume, the number of replica/stripe bricks to be added must be equal to the number of distribute subvolumes. - Check the volume information using the following command:
# gluster volume infoThe command output displays information similar to the following:Volume Name: test-volume Type: Distribute Status: Started Number of Bricks: 4 Bricks: Brick1: server1:/exp1 Brick2: server2:/exp2 Brick3: server3:/exp3 Brick4: server4:/exp4
- Rebalance the volume to ensure that files will be distributed to the new brick. Use the rebalance command as described in Section 10.6, “Rebalancing Volumes”.
10.3. Shrinking Volumes
Note
Procedure 10.2. Shrinking a Volume
- Remove a brick using the following command:
# gluster volume remove-brick VOLNAME BRICK startFor example:# gluster volume remove-brick test-volume server2:/exp2 start Remove Brick start successful
Note
If theremove-brickcommand is run withforceor without any option, the data on the brick that you are removing will no longer be accessible at the glusterFS mount point. When using thestartoption, the data is migrated to other bricks, and on a successful commit the removed brick's information is deleted from the volume configuration. Data can still be accessed directly on the brick. - You can view the status of the remove brick operation using the following command:
# gluster volume remove-brick VOLNAME BRICK statusFor example:# gluster volume remove-brick test-volume server2:/exp2 status Node Rebalanced-files size scanned failures status --------- ----------- ----------- ----------- ----------- ------------ localhost 16 16777216 52 0 in progress 192.168.1.1 13 16723211 47 0 in progress - When the data migration shown in the previous
statuscommand is complete, run the following command to commit the brick removal:# gluster volume remove-brick VOLNAME BRICK commitFor example,# gluster volume remove-brick test-volume server2:/exp2 commit
- After the brick removal, you can check the volume information using the following command:
# gluster volume infoThe command displays information similar to the following:# gluster volume info Volume Name: test-volume Type: Distribute Status: Started Number of Bricks: 3 Bricks: Brick1: server1:/exp1 Brick3: server3:/exp3 Brick4: server4:/exp4
10.3.1. Stopping a remove-brick Operation
Important
remove-brick operation is a technology preview feature. Technology Preview features are not fully supported under Red Hat subscription level agreements (SLAs), may not be functionally complete, and are not intended for production use. However, these features provide early access to upcoming product innovations, enabling customers to test functionality and provide feedback during the development process. As Red Hat considers making future iterations of Technology Preview features generally available, we will provide commercially reasonable efforts to resolve any reported issues that customers experience when using these features.
remove-brick operation that is in progress can be stopped by using the stop command.
Note
remove-brick operation will not be migrated back to the same brick when the operation is stopped.
# gluster volume remove-brick VOLNAME BRICK stopgluster volume remove-brick di rhs1:/brick1/di21 rhs1:/brick1/di21 stop Node Rebalanced-files size scanned failures skipped status run-time in secs ---- ------- ---- ---- ------ ----- ----- ------ localhost 23 376Bytes 34 0 0 stopped 2.00 rhs1 0 0Bytes 88 0 0 stopped 2.00 rhs2 0 0Bytes 0 0 0 not started 0.00 'remove-brick' process may be in the middle of a file migration. The process will be fully stopped once the migration of the file is complete. Please check remove-brick process for completion before doing any further brick related tasks on the volume.
10.4. Migrating Volumes
Note
replace-brick operation, review the known issues related to replace-brick operation in the Red Hat Storage 3.0 Release Notes.
10.4.1. Replacing a Subvolume on a Distribute or Distribute-replicate Volume
- Add the new bricks to the volume.
#
gluster volume add-brick <VOLNAME> [<stripe|replica> <COUNT>] NEW-BRICKExample 10.1. Adding a Brick to a Distribute Volume
#
gluster volume add-brick test-volume server5:/exp5Add Brick successful - Verify the volume information using the command:
#
gluster volume infoVolume Name: test-volume Type: Distribute Status: Started Number of Bricks: 5 Bricks: Brick1: server1:/exp1 Brick2: server2:/exp2 Brick3: server3:/exp3 Brick4: server4:/exp4 Brick5: server5:/exp5Note
In case of a Distribute-replicate or stripe volume, you must specify the replica or stripe count in theadd-brickcommand and provide the same number of bricks as the replica or stripe count to theadd-brickcommand. - Remove the bricks to be replaced from the subvolume.
- Start the
remove-brickoperation using the command:# gluster volume remove-brick <VOLNAME> [replica <COUNT>] <BRICK> startExample 10.2. Start a remove-brick operation on a distribute volume
#gluster volume remove-brick test-volume server2:/exp2 startRemove Brick start successful - View the status of the
remove-brickoperation using the command:# gluster volume remove-brick <VOLNAME> [replica <COUNT>] BRICK statusExample 10.3. View the Status of remove-brick Operation
# gluster volume remove-brick test-volume server2:/exp2 statusNode Rebalanced-files size scanned failures status ------------------------------------------------------------------ server2 16 16777216 52 0 in progressKeep monitoring theremove-brickoperation status by executing the above command. When the value of the status field is set tocompletein the output ofremove-brickstatus command, proceed further. - Commit the
remove-brickoperation using the command:#gluster volume remove-brick <VOLNAME> [replica <COUNT>] <BRICK> commitExample 10.4. Commit the remove-brick Operation on a Distribute Volume
#gluster volume remove-brick test-volume server2:/exp2 commit - Verify the volume information using the command:
# gluster volume infoVolume Name: test-volume Type: Distribute Status: Started Number of Bricks: 4 Bricks: Brick1: server1:/exp1 Brick3: server3:/exp3 Brick4: server4:/exp4 Brick5: server5:/exp5 - Verify the content on the brick after committing the
remove-brickoperation on the volume. If there are any files leftover, copy it through FUSE or NFS mount.- Verify if there are any pending files on the bricks of the subvolume.Along with files, all the application-specific extended attributes must be copied. glusterFS also uses extended attributes to store its internal data. The extended attributes used by glusterFS are of the form
trusted.glusterfs.*,trusted.afr.*, andtrusted.gfid. Any extended attributes other than ones listed above must also be copied.To copy the application-specific extended attributes and to achieve a an effect similar to the one that is described above, use the following shell script:Syntax:# copy.sh <glusterfs-mount-point> <brick>Example 10.5. Code Snippet Usage
If the mount point is/mnt/glusterfsand brick path is/export/brick1, then the script must be run as:# copy.sh /mnt/glusterfs /export/brick#!/bin/bash MOUNT=$1 BRICK=$2 for file in `find $BRICK ! -type d`; do rpath=`echo $file | sed -e "s#$BRICK\(.*\)#\1#g"` rdir=`dirname $rpath` cp -fv $file $MOUNT/$rdir; for xattr in `getfattr -e hex -m. -d $file 2>/dev/null | sed -e '/^#/d' | grep -v -E "trusted.glusterfs.*" | grep -v -E "trusted.afr.*" | grep -v "trusted.gfid"`; do key=`echo $xattr | cut -d"=" -f 1` value=`echo $xattr | cut -d"=" -f 2` setfattr $MOUNT/$rpath -n $key -v $value done done - To identify a list of files that are in a split-brain state, execute the command:
#gluster volume heal test-volume info - If there are any files listed in the output of the above command, delete those files from the mount point and manually retain the correct copy of the file after comparing the files across the bricks in a replica set. Selecting the correct copy of the file needs manual intervention by the System Administrator.
10.4.2. Replacing an Old Brick with a New Brick on a Replicate or Distribute-replicate Volume
- Ensure that the new brick (
sys5:/home/gfs/r2_5) that replaces the old brick (sys0:/home/gfs/r2_0) is empty. Ensure that all the bricks are online. The brick that must be replaced can be in an offline state. - Bring the brick that must be replaced to an offline state, if it is not already offline.
- Identify the PID of the brick to be replaced, by executing the command:
#
gluster volume statusStatus of volume: r2 Gluster process Port Online Pid ------------------------------------------------------- Brick sys0:/home/gfs/r2_0 49152 Y 5342 Brick sys1:/home/gfs/r2_1 49153 Y 5354 Brick sys2:/home/gfs/r2_2 49154 Y 5365 Brick sys3:/home/gfs/r2_3 49155 Y 5376 - Log in to the host on which the brick to be replaced has its process running and kill the brick.
#kill -9 <PID> - Ensure that the brick to be replaced is offline and the other bricks are online by executing the command:
# gluster volume statusStatus of volume: r2 Gluster process Port Online Pid ------------------------------------------------------ Brick sys0:/home/gfs/r2_0 N/A N 5342 Brick sys1:/home/gfs/r2_1 49153 Y 5354 Brick sys2:/home/gfs/r2_2 49154 Y 5365 Brick sys3:/home/gfs/r2_3 49155 Y 5376
- Create a FUSE mount point from any server to edit the extended attributes. Using the NFS and CIFS mount points, you will not be able to edit the extended attributes.
- Perform the following operations to change the Automatic File Replication extended attributes so that the heal process happens from the other brick (
sys1:/home/gfs/r2_1) in the replica pair to the new brick (sys5:/home/gfs/r2_5).Note that/mnt/r2is the FUSE mount path.- Create a new directory on the mount point and ensure that a directory with such a name is not already present.
#
mkdir /mnt/r2/<name-of-nonexistent-dir> - Delete the directory and set the extended attributes.
#
rmdir /mnt/r2/<name-of-nonexistent-dir>#
setfattr -n trusted.non-existent-key -v abc /mnt/r2#setfattr -x trusted.non-existent-key /mnt/r2 - Ensure that the extended attributes on the other bricks in the replica (in this example,
trusted.afr.r2-client-0) is not set to zero.#
getfattr -d -m. -e hex /home/gfs/r2_1 # file: home/gfs/r2_1security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.r2-client-0=0x000000000000000300000002 trusted.afr.r2-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.volume-id=0xde822e25ebd049ea83bfaa3c4be2b440
- Execute the
replace-brickcommand with theforceoption:#
gluster volume replace-brick r2 sys0:/home/gfs/r2_0 sys5:/home/gfs/r2_5 commit forcevolume replace-brick: success: replace-brick commit successful - Check if the new brick is online.
#
gluster volume statusStatus of volume: r2 Gluster process Port Online Pid --------------------------------------------------------- Brick sys5:/home/gfs/r2_5 49156 Y 5731 Brick sys1:/home/gfs/r2_1 49153 Y 5354 Brick sys2:/home/gfs/r2_2 49154 Y 5365 Brick sys3:/home/gfs/r2_3 49155 Y 5376 - Ensure that after the self-heal completes, the extended attributes are set to zero on the other bricks in the replica.
#
getfattr -d -m. -e hex /home/gfs/r2_1getfattr: Removing leading '/' from absolute path names # file: home/gfs/r2_1 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.r2-client-0=0x000000000000000000000000 trusted.afr.r2-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.volume-id=0xde822e25ebd049ea83bfaa3c4be2b440Note that in this example, the extended attributestrusted.afr.r2-client-0andtrusted.afr.r2-client-1are set to zero.
10.4.3. Replacing an Old Brick with a New Brick on a Distribute Volume
Important
- Replace a brick with a commit
forceoption:# gluster volume replace-brick <VOLNAME> <BRICK> <NEW-BRICK> commit forceExample 10.6. Replace a brick on a Distribute Volume
# gluster volume replace-brick r2 sys0:/home/gfs/r2_0 sys5:/home/gfs/r2_5 commit forcevolume replace-brick: success: replace-brick commit successful - Verify if the new brick is online.
# gluster volume statusStatus of volume: r2 Gluster process Port Online Pid --------------------------------------------------------- Brick sys5:/home/gfs/r2_5 49156 Y 5731 Brick sys1:/home/gfs/r2_1 49153 Y 5354 Brick sys2:/home/gfs/r2_2 49154 Y 5365 Brick sys3:/home/gfs/r2_3 49155 Y 5376
Note
replace-brick command options except the commit force option are deprecated.
10.5. Replacing Hosts
10.5.1. Replacing a Host Machine with a Different IP Address
Important
sys0 and the replacement machine is sys5. The brick with an unrecoverable failure is sys0:/home/gfs/r2_0 and the replacement brick is sys5:/home/gfs/r2_5.
- First probe the new peer from one of the existing peers to bring it into the cluster.
# gluster peer probe sys5 - Ensure that the new brick
(sys5:/home/gfs/r2_5)that replaces the old brick(sys0:/home/gfs/r2_0)is empty. Ensure that all the bricks are online. The brick that must be replaced can be in an offline state. - Bring the brick that must be replaced to an offline state, if it is not already offline.
- Identify the PID of the brick to be replaced, by executing the command:
# gluster volume statusStatus of volume: r2 Gluster process Port Online Pid Brick sys0:/home/gfs/r2_0 49152 Y 5342 Brick sys1:/home/gfs/r2_1 49153 Y 5354 Brick sys2:/home/gfs/r2_2 49154 Y 5365 Brick sys3:/home/gfs/r2_3 49155 Y 5376 - Log in to the host on which the brick to be replaced has its process running and kill the brick.
#kill -9 <PID> - Ensure that the brick to be replaced is offline and the other bricks are online by executing the command:
# gluster volume statusStatus of volume: r2 Gluster process Port Online Pid Brick sys0:/home/gfs/r2_0 N/A N 5342 Brick sys1:/home/gfs/r2_1 49153 Y 5354 Brick sys2:/home/gfs/r2_2 49154 Y 5365 Brick sys3:/home/gfs/r2_3 49155 Y 5376
- Create a FUSE mount point from any server to edit the extended attributes. Using the NFS and CIFS mount points, you will not be able to edit the extended attributes.
#mount -t glusterfs server-ip:/volname mount-point - Perform the following operations to change the Automatic File Replication extended attributes so that the heal process happens from the other brick (sys1:/home/gfs/r2_1) in the replica pair to the new brick (sys5:/home/gfs/r2_5). Note that /mnt/r2 is the FUSE mount path.
- Create a new directory on the mount point and ensure that a directory with such a name is not already present.
#mkdir /mnt/r2/<name-of-nonexistent-dir> - Delete the directory and set the extended attributes.
#rmdir /mnt/r2/<name-of-nonexistent-dir>#setfattr -n trusted.non-existent-key -v abc /mnt/r2#setfattr -x trusted.non-existent-key /mnt/r2 - Ensure that the extended attributes on the other bricks in the replica (in this example,
trusted.afr.r2-client-0) is not set to zero.#getfattr -d -m. -e hex /home/gfs/r2_1 # file: home/gfs/r2_1security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.r2-client-0=0x000000000000000300000002 trusted.afr.r2-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.volume-id=0xde822e25ebd049ea83bfaa3c4be2b440
- Execute the
replace-brickcommand with the force option:# gluster volume replace-brick r2 sys0:/home/gfs/r2_0 sys5:/home/gfs/r2_5 commit forcevolume replace-brick: success: replace-brick commit successful - Verify if the new brick is online.
# gluster volume statusStatus of volume: r2 Gluster process Port Online Pid Brick sys5:/home/gfs/r2_5 49156 Y 5731 Brick sys1:/home/gfs/r2_1 49153 Y 5354 Brick sys2:/home/gfs/r2_2 49154 Y 5365 Brick sys3:/home/gfs/r2_3 49155 Y 5376At this point, self heal is triggered automatically by the self-heal Daemon. The status of the heal process can be seen by executing hte coomand:#gluster volume heal volname info - Detach the original machine from the trusted pool.
#gluster peer detach sys0
- Ensure that after the self-heal completes, the extended attributes are set to zero on the other bricks in the replica.
#getfattr -d -m. -e hex /home/gfs/r2_1getfattr: Removing leading '/' from absolute path names #file: home/gfs/r2_1 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.r2-client-0=0x000000000000000000000000 trusted.afr.r2-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.volume-id=0xde822e25ebd049ea83bfaa3c4be2b440Note
Note that in this example, the extended attributestrusted.afr.r2-client-0andtrusted.afr.r2-client-1are set to zero.
10.5.2. Replacing a Host Machine with the Same IP Address
- Stop the
glusterdservice on the Example1 node.# service glusterd stop
- Retrieve the UUID of the failed node (1) from another of the cluster by executing the command:
# gluster peer status Number of Peers: 2 Hostname: 192.168.1.44 Uuid: 1d9677dc-6159-405e-9319-ad85ec030880 State: Peer in Cluster (Connected) Hostname: 192.168.1.45 Uuid: b5ab2ec3-5411-45fa-a30f-43bd04caf96b State: Peer Rejected (Connected)
Note that the UUID of the failed host is b5ab2ec3-5411-45fa-a30f-43bd04caf96b - Edit the glusterd.info file in the new host and include the UUID of the host you retrieved in the previous step.
# cat /var/lib/glusterd/glusterd.info UUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96b operating-version=30000
- Select any node (say Example2) in the cluster and retrieve its UUID from the glusterd.info file.
# grep -i uuid /var/lib/glusterd/glusterd.info UUID=8cc6377d-0153-4540-b965-a4015494461c
- Gather the peer information files from the node (Example2) in the previous step. Execute the following command in that node (Example2) of the cluster.
# cp -a /var/lib/glusterd/peers /tmp/
- Remove the peer file corresponding to failed peer the /tmp/peers directory.
rm -r /tmp/peers/b5ab2ec3-5411-45fa-a30f-43bd04caf96b
Note that the UUID corresponds to the UUID of the failed node retrieved in Step 2. - Archive all the files and copy those to the failed node(Example1)
# cd /tmp; tar -cvf peers.tar peers
- Copy the above created file to the new peer
# scp /tmp/peers.tar root@NEWNODE:/tmp
- Copy the extracted content to the
/var/lib/glusterd/peersdirectory. Execute the following command in the newly added node (Example1)# tar -xvf /tmp/peers.tar # cp peers/* /var/lib/glusterd/peers/
- Select any other node in the cluster other than the node (Example2) selected in step 4. Copy the peer file corresponding to the UUID of the node retrieved in step4 to the new node (Example1) by executing the following command:
# scp /var/lib/glusterd/peers/<UUID-retrieved-from-step4> root@NEWNODE:/var/lib/glusterd/peers/
- Retrieve the brick directory information, by executing the following command in any node in the cluster.
# gluster volume info
- Create a brick directory. If the brick directory is already available delete it and create a new one.
# mkdir brick1
- Use a FUSE mount point to mount the GlusterFS volume
# mount -t glusterfs <server-name>:/<vol-name> <mount>
- Create a new directory on the mount and delete that directory and set the extended attributes and clear them. This is done to ensure that the self heal process is triggered in the right direction.
# mkdir <mount>/dir1 # rmdir <mount>/dir1 # setfattr -n trusted.non-existent-key -v abc /mnt/r2 # setfattr -x trusted.non-existent-key /mnt/r2
- Retrieve the volume-id from the existing brick from another node by executing the following command on any node that contains the bricks for the volume.
# getfattr -d -m. -ehex <brick-path>
Copy the volume ID.# getfattr -d -m. -ehex /rhs/brick1/drv1 getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/drv1 trusted.afr.drv-client-0=0x000000000000000000000000 trusted.afr.drv-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.volume-id=0x8f16258c88a0498fbd53368706af7496
In the above example, the volume id is 0x8f16258c88a0498fbd53368706af7496 - Set this volume id on the brick created in the newly added node and execute the following command on the newly added host (Example1)
# setfattr -n trusted.glusterfs.volume-id -v <volume-id> <brick-path>
For Example:#setfattr -n trusted.glusterfs.volume-id -v 0x8f16258c88a0498fbd53368706af7496 /rhs/brick2/drv2
- Restart the
glusterdservice.Note
If there are only 2 nodes in the cluster, perform the first 3 steps and continue with the following steps:- Regenerate peer file in the newly created node.
- Edit
/var/lib/glusterd/peers/<uuid-of-other-peer>to contain the following:UUID=<uuid-of-other-node>state=3hostname=<hostname>
Continue from step 11 to 17 as documented above. - Perform a self heal on the restored volume.
# gluster volume heal <VOLNAME>
- You can view the gluster volume self heal status by executing the following command:
# gluster volume heal <VOLNAME> info
10.6. Rebalancing Volumes
add-brick or remove-brick commands, the data on the volume needs to be rebalanced among the servers.
Note
rebalance operation using the start option. In a replicated volume, at least one of the bricks in the replica should be online.
# gluster volume rebalance VOLNAME start# gluster volume rebalance test-volume start Starting rebalancing on volume test-volume has been successful
rebalance operation, without force option, will attempt to balance the space utilized across nodes, thereby skipping files to rebalance in case this would cause the target node of migration to have lesser available space than the source of migration. This leads to link files that are still left behind in the system and hence may cause performance issues in access when a large number of such link files are present.
volume rebalance: <volname>: failed: Volume <volname> has one or more connected clients of a version lower than RHS-2.1 update 5. Starting rebalance in this state could lead to data loss. Please disconnect those clients before attempting this command again.
Warning
Rebalance command can be executed with the force option even when the older clients are connected to the cluster. However, this could lead to a data loss situation.
rebalance operation with force, balances the data based on the layout, and hence optimizes or does away with the link files, but may lead to an imbalanced storage space used across bricks. This option is to be used only when there are a large number of link files in the system.
# gluster volume rebalance VOLNAME start force# gluster volume rebalance test-volume start force Starting rebalancing on volume test-volume has been successful
10.6.1. Displaying Status of a Rebalance Operation
# gluster volume rebalance VOLNAME status# gluster volume rebalance test-volume status
Node Rebalanced-files size scanned failures status
--------- ----------- ----------- ----------- ----------- ------------
localhost 112 14567 150 0 in progress
10.16.156.72 140 2134 201 2 in progress# gluster volume rebalance test-volume status
Node Rebalanced-files size scanned failures status
--------- ----------- ----------- ----------- ----------- ------------
localhost 112 14567 150 0 in progress
10.16.156.72 140 2134 201 2 in progresscompleted the following when the rebalance is complete:
# gluster volume rebalance test-volume status
Node Rebalanced-files size scanned failures status
--------- ----------- ----------- ----------- ----------- ------------
localhost 112 15674 170 0 completed
10.16.156.72 140 3423 321 2 completed10.6.2. Stopping a Rebalance Operation
# gluster volume rebalance VOLNAME stop# gluster volume rebalance test-volume stop
Node Rebalanced-files size scanned failures status
--------- ----------- ----------- ----------- ----------- ------------
localhost 102 12134 130 0 stopped
10.16.156.72 110 2123 121 2 stopped
Stopped rebalance process on volume test-volume10.7. Stopping Volumes
# gluster volume stop VOLNAME# gluster volume stop test-volume
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
Stopping volume test-volume has been successful10.8. Deleting Volumes
# gluster volume delete VOLNAME# gluster volume delete test-volume
Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y
Deleting volume test-volume has been successful10.9. Managing Split-brain
- Data split-brain: Contents of the file under split-brain are different in different replica pairs and automatic healing is not possible.
- Metadata split-brain : The metadata of the files (example, user defined extended attribute) are different and automatic healing is not possible.
- Entry split-brain: This happens when a file have different gfids on each of the replica pair.
10.9.1. Preventing Split-brain
10.9.1.1. Configuring Server-Side Quorum
cluster.server-quorum-type volume option as server. For more information on this volume option, see Section 10.1, “Configuring Volume Options”.
glusterd service. Whenever the glusterd service on a machine observes that the quorum is not met, it brings down the bricks to prevent data split-brain. When the network connections are brought back up and the quorum is restored the bricks in the volume are brought back up. When the quorum is not met for a volume, any commands that update the volume configuration or peer addition or detach are not allowed. It is to be noted that both, the glusterd service not running and the network connection between two machines being down are treated equally.
# gluster volume set all cluster.server-quorum-ratio PERCENTAGE# gluster volume set all cluster.server-quorum-ratio 51%
# gluster volume set VOLNAME cluster.server-quorum-type serverImportant
10.9.1.2. Configuring Client-Side Quorum
m of n replica groups only m replica groups becomes read-only and the rest of the replica groups continue to allow data modifications.
Example 10.7. Client-Side Quorum

A, only replica group A becomes read-only. Replica groups B and C continue to allow data modifications.
Important
cluster.quorum-type and cluster.quorum-count options. For more information on these options, see Section 10.1, “Configuring Volume Options”.
Important
gluster volume set VOLNAME group virt command. If on a two replica set up, if the first brick in the replica pair is offline, virtual machines will be paused because quorum is not met and writes are disallowed.
# gluster volume reset VOLNAME quorum-type
10.9.2. Recovering from File Split-brain
Procedure 10.3. Steps to recover from a file split-brain
- Run the following command to obtain the path of the file that is in split-brain:
# gluster volume heal VOLNAME info split-brain
From the command output, identify the files for which file operations performed from the client keep failing with Input/Output error. - Close the applications that opened split-brain file from the mount point. If you are using a virtual machine, you must power off the machine.
- Obtain and verify the AFR changelog extended attributes of the file using the
getfattrcommand. Then identify the type of split-brain to determine which of the bricks contains the 'good copy' of the file.getfattr -d -m . -e hex <file-path-on-brick>
For example,# getfattr -d -e hex -m. brick-a/file.txt \#file: brick-a/file.txt security.selinux=0x726f6f743a6f626a6563745f723a66696c655f743a733000 trusted.afr.vol-client-2=0x000000000000000000000000 trusted.afr.vol-client-3=0x000000000200000000000000 trusted.gfid=0x307a5c9efddd4e7c96e94fd4bcdcbd1b
The extended attributes withtrusted.afr.<volname>-client-<subvolume-index>are used by AFR to maintain changelog of the file. The values of thetrusted.afr.<volname>-client-<subvolume-index>are calculated by the glusterFS client (FUSE or NFS-server) processes. When the glusterFS client modifies a file or directory, the client contacts each brick and updates the changelog extended attribute according to the response of the brick.subvolume-indexis thebrick number - 1ofgluster volume info VOLNAMEoutput.For example,# gluster volume info vol Volume Name: vol Type: Distributed-Replicate Volume ID: 4f2d7849-fbd6-40a2-b346-d13420978a01 Status: Created Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: brick-a: server1:/gfs/brick-a brick-b: server1:/gfs/brick-b brick-c: server1:/gfs/brick-c brick-d: server1:/gfs/brick-d brick-e: server1:/gfs/brick-e brick-f: server1:/gfs/brick-f brick-g: server1:/gfs/brick-g brick-h: server1:/gfs/brick-h
In the example above:Brick | Replica set | Brick subvolume index ---------------------------------------------------------------------------- -/gfs/brick-a | 0 | 0 -/gfs/brick-b | 0 | 1 -/gfs/brick-c | 1 | 2 -/gfs/brick-d | 1 | 3 -/gfs/brick-e | 2 | 4 -/gfs/brick-f | 2 | 5 -/gfs/brick-g | 3 | 6 -/gfs/brick-h | 3 | 7 ```
Each file in a brick maintains the changelog of itself and that of the files present in all the other bricks in it's replica set as seen by that brick.In the example volume given above, all files in brick-a will have 2 entries, one for itself and the other for the file present in it's replica pair. The following is the changelog for brick-b,- trusted.afr.vol-client-0=0x000000000000000000000000 - is the changelog for itself (brick-a)
- trusted.afr.vol-client-1=0x000000000000000000000000 - changelog for brick-b as seen by brick-a
Likewise, all files in brick-b will have the following:- trusted.afr.vol-client-0=0x000000000000000000000000 - changelog for brick-a as seen by brick-b
- trusted.afr.vol-client-1=0x000000000000000000000000 - changelog for itself (brick-b)
The same can be extended for other replica pairs.Interpreting changelog (approximate pending operation count) valueEach extended attribute has a value which is 24 hexa decimal digits. First 8 digits represent changelog of data. Second 8 digits represent changelog of metadata. Last 8 digits represent Changelog of directory entries.Pictorially representing the same is as follows:0x 000003d7 00000001 00000000110 | | | | | \_ changelog of directory entries | \_ changelog of metadata \ _ changelog of dataFor directories, metadata and entry changelogs are valid. For regular files, data and metadata changelogs are valid. For special files like device files and so on, metadata changelog is valid. When a file split-brain happens it could be either be data split-brain or meta-data split-brain or both.The following is an example of both data, metadata split-brain on the same file:# getfattr -d -m . -e hex /gfs/brick-?/a getfattr: Removing leading '/' from absolute path names \#file: gfs/brick-a/a trusted.afr.vol-client-0=0x000000000000000000000000 trusted.afr.vol-client-1=0x000003d70000000100000000 trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57 \#file: gfs/brick-b/a trusted.afr.vol-client-0=0x000003b00000000100000000 trusted.afr.vol-client-1=0x000000000000000000000000 trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
Scrutinize the changelogsThe changelog extended attributes on file/gfs/brick-a/aare as follows:- The first 8 digits of
trusted.afr.vol-client-0 are all zeros (0x00000000................),The first 8 digits oftrusted.afr.vol-client-1are not all zeros (0x000003d7................).So the changelog on/gfs/brick-a/aimplies that some data operations succeeded on itself but failed on/gfs/brick-b/a. - The second 8 digits of
trusted.afr.vol-client-0 are all zeros (0x........00000000........), and the second 8 digits oftrusted.afr.vol-client-1are not all zeros (0x........00000001........).So the changelog on/gfs/brick-a/aimplies that some metadata operations succeeded on itself but failed on/gfs/brick-b/a.
The changelog extended attributes on file/gfs/brick-b/aare as follows:- The first 8 digits of
trusted.afr.vol-client-0are not all zeros (0x000003b0................).The first 8 digits oftrusted.afr.vol-client-1are all zeros (0x00000000................).So the changelog on/gfs/brick-b/aimplies that some data operations succeeded on itself but failed on/gfs/brick-a/a. - The second 8 digits of
trusted.afr.vol-client-0are not all zeros (0x........00000001........)The second 8 digits oftrusted.afr.vol-client-1are all zeros (0x........00000000........).So the changelog on/gfs/brick-b/aimplies that some metadata operations succeeded on itself but failed on/gfs/brick-a/a.
Here, both the copies have data, metadata changes that are not on the other file. Hence, it is both data and metadata split-brain.Deciding on the correct copyYou must inspectstatandgetfattroutput of the files to decide which metadata to retain and contents of the file to decide which data to retain. To continue with the example above, here, we are retaining the data of/gfs/brick-a/aand metadata of/gfs/brick-b/a.Resetting the relevant changelogs to resolve the split-brainResolving data split-brainYou must change the changelog extended attributes on the files as if some data operations succeeded on/gfs/brick-a/abut failed on /gfs/brick-b/a. But/gfs/brick-b/ashouldnothave any changelog showing data operations succeeded on/gfs/brick-b/abut failed on/gfs/brick-a/a. You must reset the data part of the changelog ontrusted.afr.vol-client-0of/gfs/brick-b/a.Resolving metadata split-brainYou must change the changelog extended attributes on the files as if some metadata operations succeeded on/gfs/brick-b/abut failed on/gfs/brick-a/a. But/gfs/brick-a/ashouldnothave any changelog which says some metadata operations succeeded on/gfs/brick-a/abut failed on/gfs/brick-b/a. You must reset metadata part of the changelog ontrusted.afr.vol-client-1of/gfs/brick-a/aRun the following commands to reset the extended attributes.- On
/gfs/brick-b/a, fortrusted.afr.vol-client-0 0x000003b00000000100000000to0x000000000000000100000000, execute the following command:# setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000 /gfs/brick-b/a
- On
/gfs/brick-a/a, fortrusted.afr.vol-client-1 0x0000000000000000ffffffffto0x000003d70000000000000000, execute the following command:# setfattr -n trusted.afr.vol-client-1 -v 0x000003d70000000000000000 /gfs/brick-a/a
After you reset the extended attributes, the changelogs would look similar to the following:# getfattr -d -m . -e hex /gfs/brick-?/a getfattr: Removing leading '/' from absolute path names \#file: gfs/brick-a/a trusted.afr.vol-client-0=0x000000000000000000000000 trusted.afr.vol-client-1=0x000003d70000000000000000 trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57 \#file: gfs/brick-b/a trusted.afr.vol-client-0=0x000000000000000100000000 trusted.afr.vol-client-1=0x000000000000000000000000 trusted.gfid=0x80acdbd886524f6fbefa21fc356fed57
Resolving Directory entry split-brainAFR has the ability to conservatively merge different entries in the directories when there is a split-brain on directory. If on one brick directorystoragehas entries1,2and has entries3,4on the other brick then AFR will merge all of the entries in the directory to have1, 2, 3, 4entries in the same directory. But this may result in deleted files to re-appear in case the split-brain happens because of deletion of files on the directory. Split-brain resolution needs human intervention when there is at least one entry which has same file name but differentgfidin that directory.For example:Onbrick-athe directory has 2 entriesfile1withgfid_xandfile2. Onbrick-bdirectory has 2 entriesfile1withgfid_yandfile3. Here the gfid's offile1on the bricks are different. These kinds of directory split-brain needs human intervention to resolve the issue. You must remove eitherfile1onbrick-aor thefile1onbrick-bto resolve the split-brain.In addition, the correspondinggfid-linkfile must be removed. Thegfid-linkfiles are present in the .glusterfsdirectory in the top-level directory of the brick. If the gfid of the file is0x307a5c9efddd4e7c96e94fd4bcdcbd1b(the trusted.gfid extended attribute received from thegetfattrcommand earlier), the gfid-link file can be found at/gfs/brick-a/.glusterfs/30/7a/307a5c9efddd4e7c96e94fd4bcdcbd1b.Warning
Before deleting thegfid-link, you must ensure that there are no hard links to the file present on that brick. If hard-links exist, you must delete them. - Trigger self-heal by running the following command:
# ls -l <file-path-on-gluster-mount>
or# gluster volume heal VOLNAME
10.9.3. Triggering Self-Healing on Replicated Volumes
- To view the list of files that need healing:
# gluster volume heal VOLNAME infoFor example, to view the list of files on test-volume that need healing:# gluster volume heal test-volume info Brick server1:/gfs/test-volume_0 Number of entries: 0 Brick server2:/gfs/test-volume_1 /95.txt /32.txt /66.txt /35.txt /18.txt /26.txt - Possibly undergoing heal /47.txt /55.txt /85.txt - Possibly undergoing heal ... Number of entries: 101
- To trigger self-healing only on the files which require healing:
# gluster volume heal VOLNAMEFor example, to trigger self-healing on files which require healing on test-volume:# gluster volume heal test-volume Heal operation on volume test-volume has been successful
- To trigger self-healing on all the files on a volume:
# gluster volume heal VOLNAME fullFor example, to trigger self-heal on all the files on test-volume:# gluster volume heal test-volume full Heal operation on volume test-volume has been successful
- To view the list of files on a volume that are in a split-brain state:
# gluster volume heal VOLNAME info split-brainFor example, to view the list of files on test-volume that are in a split-brain state:# gluster volume heal test-volume info split-brain Brick server1:/gfs/test-volume_2 Number of entries: 12 at path on brick ---------------------------------- 2012-06-13 04:02:05 /dir/file.83 2012-06-13 04:02:05 /dir/file.28 2012-06-13 04:02:05 /dir/file.69 Brick server2:/gfs/test-volume_2 Number of entries: 12 at path on brick ---------------------------------- 2012-06-13 04:02:05 /dir/file.83 2012-06-13 04:02:05 /dir/file.28 2012-06-13 04:02:05 /dir/file.69 ...
10.10. Non Uniform File Allocation (NUFA)
Important
gluster volume set VOLNAMEcluster.nufa enableon.
Important
- Volumes with only with one brick per server.
- For use with a FUSE client. NUFA is not supported with NFS or SMB.
- A client that is mounting a NUFA-enabled volume must be present within the trusted storage pool.
Chapter 11. Configuring Red Hat Storage for Enhancing Performance
11.1. Brick Configuration
Procedure 11.1. Brick Configuration
LVM layer
Align the I/O at the Logical Volume Manager (LVM) layer using the --dataalignmentalignment number option while creating the physical volume. This number is calculated by multiplying the stripe element size of hardware RAID and count of data disks in that RAID level. This configuration at the LVM layer makes a significant impact on the overall performance. This has to be set regardless of the RAID level. If 12 disks are used to configure RAID 10 and RAID 6 then the data disks would be 6 and 10 respectively.- Run the following command for the RAID 10 volume configured using 12 disks (6 data disks) and 256 K stripe element size:
# pvcreate --dataalignment 1536K <disk> Physical volume <disk> successfully created
- Run the following command for the RAID 6 volume, which is configured using 12 disks and 256 K stripe element size:
# pvcreate --dataalignment 2560K <disk> Physical volume <disk> successfully created
- To view the previously configured physical volume settings for --
dataalignment, run the following command :# pvs -o +pe_start <disk> PV VG Fmt Attr PSize PFree 1st PE /dev/sdb lvm2 a-- 9.09t 9.09t 2.50m
XFS Inode Size
Due to extensive use of extended attributes, Red Hat Storage recommends the XFS inode size to be set to 512 bytes from the default 256 bytes. So, the inode size for XFS should be set to 512 bytes, while formatting the Red Hat Storage bricks. To set the inode size, run the following command:# mkfs.xfs -i size=512 <disk>
XFS RAID Alignment
To align the I/O at the file system layer it is important to set the correct stripe unit (stripe element size) and stripe width (number of data disks) while formatting the file system. These options are sometimes auto-detected but manual configuration is required for many of the hardware RAID volumes.- Run the following command for the RAID 10 volume, which is configured using 12 disks and 256K stripe element size:
# mkfs.xfs -d su=256K,sw=6 <disk>
where,suis the RAID controller stripe unit size andswis the number of data disks. - Run the following command for the RAID 6 volume, which is configured using 12 disks and 256K stripe element size:
# mkfs.xfs -d su=256K,sw=10 <disk>
Logical Block Size for the Directory
In an XFS file system, you can select a logical block size for the file system directory that is greater than the logical block size of the file system. Increasing the logical block size for the directories from the default 4 K, decreases the directory I/O, which in turn improves the performance of directory operations. For example:# mkfs.xfs -n size=8192 <disk>
The RAID 6 configuration output is as follows:# mkfs.xfs -f -i size=512 -n size=8192 -d su=256k,sw=10 <logical volume> meta-data=/dev/mapper/gluster-brick1 isize=512 agcount=32, agsize=37748736 blks = sectsz=512 attr=2, projid32bit=0 data = bsize=4096 blocks=1207959552, imaxpct=5 = sunit=64 swidth=640 blks naming =version 2 bsize=8192 ascii-ci=0 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=64 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0Allocation Strategy
inode32 and inode64 are the two most common allocation strategies for XFS. With the inode32 allocation strategy, XFS places all the inodes in the first 1TB of disk. With a larger disk, all the inodes would be stuck in first 1 TB.With inode64, mount option inodes would be replaced near to the data, which would minimize disk seeks.With the current release, inode32 allocation strategy is used by default.To use inode64 allocation strategy with the current release, the file system needs be mounted with the inode64 mount option. For example:# mount -t xfs -o inode64 <logical volume> <mount point>
Access Time
If the application does not require to update the access time on files, then the file system should always be mounted with thenoatimemount option. For example:# mount -t xfs -o inode64,noatime <logical volume> <mount point>
This optimization improves the performance of small-file reads by avoiding updates to the XFS inodes when files are read./etc/fstab entry for option E + F <logical volume> <mount point>xfs inode64,noatime 0 0
Performance tuning option in Red Hat Storage
Run the following command after creating the volume:# tuned-adm profile default ; tuned-adm profile rhs-high-throughput Switching to profile 'default' Applying ktune sysctl settings: /etc/ktune.d/tunedadm.conf: [ OK ] Applying sysctl settings from /etc/sysctl.conf Starting tuned: [ OK ] Stopping tuned: [ OK ] Switching to profile 'rhs-high-throughput'
This profile performs the following:- Increases read ahead to 64 MB
- Changes I/O scheduler to
deadline - Disables power-saving mode
Writeback caching
For small-file and random write performance, we strongly recommend a writeback cache, that is, a non-volatile random-access memory (NVRAM) in your storage controller. For example, normal Dell and HP storage controllers have it. Make sure that NVRAM is enabled, that is, the battery is working. Refer your hardware documentation for details about enabling NVRAM.Do not enable writeback caching in disk drives. This is a policy where the disk drive considers the write is complete before the write actually makes it to the magnetic media (platter). As a result, the disk write cache might lose its data during a power failure or even lose its metadata leading to file system corruption.Allocation groups
Each XFS file system is partitioned into regions called allocation groups. Allocation groups are similar to the block groups in ext3, but allocation groups are much larger than block groups and are used for scalability and parallelism rather than disk locality. The default allocation for an allocation group is 1 TB.Allocation group count must be large enough to sustain the concurrent allocation workload. In most of the cases, the allocation group count chosen by themkfs.xfscommand would give the optimal performance. Do not change the allocation group count chosen bymkfs.xfs, while formatting the file system.Percentage of space allocation to inodes
If the workload consists of very small files (average file size is less than 10 KB ), then it is recommended to set themaxpctvalue to10, while formatting the file system.
11.2. Network
11.3. RAM on the Nodes
11.4. Number of Clients
rhs-virtualization tuned profile, which increases ARP (Address Resolution Protocol) table size, but has less aggressive read ahead setting of 4 MB. This is 32 times the Linux default but small enough to avoid fairness issues with large numbers of files being concurrently read.
# tuned-adm profile rhs-virtualization
11.5. Replication
11.6. Hardware RAID
Chapter 12. Managing Geo-replication
- 12.1. About Geo-replication
- 12.2. Replicated Volumes vs Geo-replication
- 12.3. Preparing to Deploy Geo-replication
- 12.4. Starting Geo-replication
- 12.5. Starting Geo-replication on a Newly Added Brick
- 12.6. Disaster Recovery
- 12.7. Example - Setting up Cascading Geo-replication
- 12.8. Recommended Practices
- 12.9. Troubleshooting Geo-replication
12.1. About Geo-replication
- Master – a Red Hat Storage volume.
- Slave – a Red Hat Storage volume. A slave volume can be either a local volume, such as
localhost::volname, or a volume on a remote host, such asremote-host::volname.
12.2. Replicated Volumes vs Geo-replication
| Replicated Volumes | Geo-replication |
|---|---|
| Mirrors data across bricks within one trusted storage pool. | Mirrors data across geographically distributed trusted storage pools. |
| Provides high-availability. | Provides back-ups of data for disaster recovery. |
| Synchronous replication: each and every file operation is applied to all the bricks. | Asynchronous replication: checks for changes in files periodically, and syncs them on detecting differences. |
12.3. Preparing to Deploy Geo-replication
12.3.1. Exploring Geo-replication Deployment Scenarios
- Geo-replication over LAN
- Geo-replication over WAN
- Geo-replication over the Internet
- Multi-site cascading geo-replication




12.3.2. Geo-replication Deployment Overview
- Verify that your environment matches the minimum system requirements. See Section 12.3.3, “Prerequisites”.
- Determine the appropriate deployment scenario. See Section 12.3.1, “Exploring Geo-replication Deployment Scenarios”.
- Start geo-replication on the master and slave systems. See Section 12.4, “Starting Geo-replication”.
12.3.3. Prerequisites
- The master and slave volumes must be Red Hat Storage instances.
- Password-less SSH access is required between one node of the master volume (the node from which the
geo-replication createcommand will be executed), and one node of the slave volume (the node whose IP/hostname will be mentioned in the slave name when running thegeo-replication createcommand).Create the public and private keys usingssh-keygen(without passphrase) on the master node:# ssh-keygenCopy the public key to the slave node:# ssh-copy-id root@slave_node_IPaddress/HostnameNote
Password-less SSH access is required from the master node to slave node, whereas password-less SSH access is not required from the slave node to master node.A password-less SSH connection is also required forgsyncdbetween every node in the master to every node in the slave. Thegluster system:: execute gsec_createcommand createssecret-pemfiles on all the nodes in the master, and is used to implement the password-less SSH connection. Thepush-pemoption in thegeo-replication createcommand pushes these keys to all the nodes in the slave.For more information on thegluster system:: execute gsec_createandpush-pemcommands, see Section 12.3.4, “Configuring the Environment and Creating a Geo-replication Session”.
12.3.4. Configuring the Environment and Creating a Geo-replication Session
- All the servers' time must be uniform on bricks of a geo-replicated master volume. It is recommended to set up a NTP (Network Time Protocol) service to keep the bricks' time synchronized, and avoid out-of-time sync effects.For example: In a replicated volume where brick1 of the master has the time 12:20, and brick2 of the master has the time 12:10 with a 10 minute time lag, all the changes on brick2 between in this period may go unnoticed during synchronization of files with a Slave.For more information on configuring NTP, see https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Migration_Planning_Guide/sect-Migration_Guide-Networking-NTP.html.
Procedure 12.1. Creating Geo-replication Sessions
- To create a common
pem pubfile, run the following command on the master node where the password-less SSH connection is configured:# gluster system:: execute gsec_create - Create the geo-replication session using the following command. The
push-pemoption is needed to perform the necessarypem-filesetup on the slave nodes.# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL create push-pem [force]For example:# gluster volume geo-replication master-vol example.com::slave-vol create push-pem
Note
There must be password-less SSH access between the node from which this command is run, and the slave host specified in the above command. This command performs the slave verification, which includes checking for a valid slave URL, valid slave volume, and available space on the slave. If the verification fails, you can use theforceoption which will ignore the failed verification and create a geo-replication session. - Verify the status of the created session by running the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status
12.4. Starting Geo-replication
12.4.1. Starting a Geo-replication Session
Important
- To start the geo-replication session between the hosts:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL startFor example:# gluster volume geo-replication master-vol example.com::slave-vol start Starting geo-replication session between master-vol & example.com::slave-vol has been successful
This command will start distributed geo-replication on all the nodes that are part of the master volume. If a node that is part of the master volume is down, the command will still be successful. In a replica pair, the geo-replication session will be active on any of the replica nodes, but remain passive on the others.After executing the command, it may take a few minutes for the session to initialize and become stable.Note
If you attempt to create a geo-replication session and the slave already has data, the following error message will be displayed:slave-node::slave is not empty. Please delete existing files in slave-node::slave and retry, or use force to continue without deleting the existing files. geo-replication command failed
- To start the geo-replication session forcefully between the hosts:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start forceFor example:# gluster volume geo-replication master-vol example.com::slave-vol start force Starting geo-replication session between master-vol & example.com::slave-vol has been successful
This command will force start geo-replication sessions on the nodes that are part of the master volume. If it is unable to successfully start the geo-replication session on any node which is online and part of the master volume, the command will still start the geo-replication sessions on as many nodes as it can. This command can also be used to re-start geo-replication sessions on the nodes where the session has died, or has not started.
12.4.2. Verifying a Successful Geo-replication Deployment
status command to verify the status of geo-replication in your environment:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status# gluster volume geo-replication master-vol example.com::slave-vol status
12.4.3. Displaying Geo-replication Status Information
status command can be used to display information about a specific geo-replication master session, master-slave session, or all geo-replication sessions.
- To display information on all geo-replication sessions from a particular master volume, use the following command:
# gluster volume geo-replication MASTER_VOL status
- To display information of a particular master-slave session, use the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status - To display the details of a master-slave session, use the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status detailImportant
There will be a mismatch between the outputs of thedfcommand (including-hand-k) and inode of the master and slave volumes when the data is in full sync. This is due to the extra inode and size consumption by thechangelogjournaling data, which keeps track of the changes done on the file system on themastervolume. Instead of running thedfcommand to verify the status of synchronization, use# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status detailinstead.The status of a session can be one of the following:- Initializing: This is the initial phase of the Geo-replication session; it remains in this state for a minute in order to make sure no abnormalities are present.
- Not Started: The geo-replication session is created, but not started.
- Active: The
gsyncdaemon in this node is active and syncing the data. - Passive: A replica pair of the active node. The data synchronization is handled by active node. Hence, this node does not sync any data.
- Faulty: The geo-replication session has experienced a problem, and the issue needs to be investigated further. For more information, see Section 12.9, “Troubleshooting Geo-replication ” section.
- Stopped: The geo-replication session has stopped, but has not been deleted.
- Crawl Status
- Changelog Crawl: The
changelogtranslator has produced the changelog and that is being consumed bygsyncddaemon to sync data. - Hybrid Crawl: The
gsyncddaemon is crawling the glusterFS file system and generating pseudo changelog to sync data.
- Checkpoint Status: Displays the status of the checkpoint, if set. Otherwise, it displays as N/A.
12.4.4. Configuring a Geo-replication Session
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config [options]# gluster volume geo-replication Volume1 example.com::slave-vol config
! (exclamation mark). For example, to reset log-level to the default value:
# gluster volume geo-replication Volume1 example.com::slave-vol config '!log-level'
| Option | Description |
|---|---|
| gluster-log-file LOGFILE | The path to the geo-replication glusterfs log file. |
| gluster-log-level LOGFILELEVEL | The log level for glusterfs processes. |
| log-file LOGFILE | The path to the geo-replication log file. |
| log-level LOGFILELEVEL | The log level for geo-replication. |
| ssh-command COMMAND | The SSH command to connect to the remote machine (the default is SSH). |
| rsync-command COMMAND | The rsync command to use for synchronizing the files (the default is rsync). |
| use-tarssh true | The use-tarssh command allows tar over Secure Shell protocol. Use this option to handle workloads of files that have not undergone edits. |
| volume_id=UID | The command to delete the existing master UID for the intermediate/slave node. |
| timeout SECONDS | The timeout period in seconds. |
| sync-jobs N | The number of simultaneous files/directories that can be synchronized. |
| ignore-deletes | If this option is set to 1, a file deleted on the master will not trigger a delete operation on the slave. As a result, the slave will remain as a superset of the master and can be used to recover the master in the event of a crash and/or accidental delete. |
| checkpoint [LABEL|now] | Sets a checkpoint with the given option LABEL. If the option is set as now, then the current time will be used as the label. |
12.4.4.1. Geo-replication Checkpoints
12.4.4.1.1. About Geo-replication Checkpoints
12.4.4.1.2. Configuring and Viewing Geo-replication Checkpoint Information
- To set a checkpoint on a geo-replication session, use the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config checkpoint[now|LABEL]For example, to set checkpoint betweenVolume1andexample.com:/data/remote_dir:# gluster volume geo-replication Volume1 example.com::slave-vol config checkpoint now geo-replication config updated successfully
The label for a checkpoint can be set as the current time usingnow, or a particular label can be specified, as shown below:# gluster volume geo-replication Volume1 example.com::slave-vol config checkpoint NEW_ACCOUNTS_CREATED geo-replication config updated successfully.
- To display the status of a checkpoint for a geo-replication session, use the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status - To delete checkpoints for a geo-replication session, use the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config '!checkpoint'For example, to delete the checkpoint set betweenVolume1andexample.com::slave-vol:# gluster volume geo-replication Volume1 example.com::slave-vol config '!checkpoint' geo-replication config updated successfully
- To view the history of checkpoints for a geo-replication session (including set, delete, and completion events), use the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config log-file | xargs grep checkpointFor example, to display the checkpoint history betweenVolume1andexample.com::slave-vol:# gluster volume geo-replication Volume1 example.com::slave-vol config log-file | xargs grep checkpoint [2013-11-12 12:40:03.436563] I [gsyncd(conf):359:main_i] <top>: checkpoint as of 2012-06-04 12:40:02 set [2013-11-15 12:41:03.617508] I master:448:checkpt_service] _GMaster: checkpoint as of 2013-11-12 12:40:02 completed [2013-11-12 03:01:17.488917] I [gsyncd(conf):359:main_i] <top>: checkpoint as of 2013-06-22 03:01:12 set [2013-11-15 03:02:29.10240] I master:448:checkpt_service] _GMaster: checkpoint as of 2013-06-22 03:01:12 completed
12.4.5. Stopping a Geo-replication Session
- To stop a geo-replication session between the hosts:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stopFor example:#gluster volume geo-replication master-vol example.com::slave-vol stop Stopping geo-replication session between master-vol & example.com::slave-vol has been successful
Note
Thestopcommand will fail if:- any node that is a part of the volume is offline.
- if it is unable to stop the geo-replication session on any particular node.
- if the geo-replication session between the master and slave is not active.
- To stop a geo-replication session forcefully between the hosts:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop forceFor example:# gluster volume geo-replication master-vol example.com::slave-vol stop force Stopping geo-replication session between master-vol & example.com::slave-vol has been successful
Usingforcewill stop the geo-replication session between the master and slave even if any node that is a part of the volume is offline. If it is unable to stop the geo-replication session on any particular node, the command will still stop the geo-replication sessions on as many nodes as it can. Usingforcewill also stop inactive geo-replication sessions.
12.4.6. Deleting a Geo-replication Session
Important
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL delete# gluster volume geo-replication master-vol example.com::slave-vol delete geo-replication command executed successfully
Note
delete command will fail if:
- any node that is a part of the volume is offline.
- if it is unable to delete the geo-replication session on any particular node.
- if the geo-replication session between the master and slave is still active.
Important
pem files which contain the SSH keys from the /var/lib/glusterd/geo-replication/ directory.
12.5. Starting Geo-replication on a Newly Added Brick
12.5.1. Starting Geo-replication for a New Brick on a New Node
Procedure 12.2. Starting Geo-replication for a New Brick on a New Node
- Run the following command on the master node where password-less SSH connection is configured, in order to create a common
pem pubfile.# gluster system:: execute gsec_create - Create the geo-replication session using the following command. The
push-pemandforceoptions are required to perform the necessarypem-filesetup on the slave nodes.# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL create push-pem forceFor example:# gluster volume geo-replication master-vol example.com::slave-vol create push-pem force
Note
There must be password-less SSH access between the node from which this command is run, and the slave host specified in the above command. This command performs the slave verification, which includes checking for a valid slave URL, valid slave volume, and available space on the slave. - Start the geo-replication session between the slave and master forcefully, using the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start force - Verify the status of the created session, using the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status
12.5.2. Starting Geo-replication for a New Brick on an Existing Node
12.6. Disaster Recovery
12.6.1. Promoting a Slave to Master
# gluster volume set VOLNAME geo-replication.indexing on# gluster volume set VOLNAME changelog on
12.6.2. Failover and Failback
Procedure 12.3. Performing a Failover and Failback
- Create a new geo-replication session with the original slave as the new master, and the original master as the new slave. For more information on setting and creating geo-replication session, see Section 12.3.4, “Configuring the Environment and Creating a Geo-replication Session”.
- Start the special synchronization mode to speed up the recovery of data from slave.
# gluster volume geo-replication ORIGINAL_SLAVE ORIGINAL_MASTER config special-sync-mode recover - Set a checkpoint to help verify the status of the data synchronization.
# gluster volume geo-replication ORIGINAL_SLAVE ORIGINAL_MASTER config checkpoint now - Start the new geo-replication session using the following command:
# gluster volume geo-replication ORIGINAL_SLAVE ORIGINAL_MASTER start - Monitor the checkpoint output using the following command, until the status displays:
checkpoint as of <time of checkpoint creation> is completed at <time of completion>..# gluster volume geo-replication ORIGINAL_SLAVE ORIGINAL_MASTER status - To resume the original master and original slave back to their previous roles, stop the I/O operations on the original slave, and using steps 3 and 5, ensure that all the data from the original slave is restored back to the original master. After the data from the original slave is restored back to the original master, stop the current geo-replication session (the failover session) between the original slave and original master, and resume the previous roles.
12.7. Example - Setting up Cascading Geo-replication
- Verify that your environment matches the minimum system requirements listed in Section 12.3.3, “Prerequisites”.
- Determine the appropriate deployment scenario. For more information on deployment scenarios, see Section 12.3.1, “Exploring Geo-replication Deployment Scenarios”.
- Configure the environment and create a geo-replication session between master-vol and interimmaster-vol.
- Create a common pem pub file, run the following command on the master node where the password-less SSH connection is configured:
# gluster system:: execute gsec_create
- Create the geo-replication session using the following command. The push-pem option is needed to perform the necessary pem-file setup on the slave nodes.
# gluster volume geo-replication master-vol interimhost.com::interimmaster-vol create push-pem
- Verify the status of the created session by running the following command:
# gluster volume geo-replication master-vol interim_HOST::interimmaster-vol status
- Start a Geo-replication session between the hosts:
# gluster volume geo-replication master-vol interimhost.com::interimmaster-vol start
This command will start distributed geo-replication on all the nodes that are part of the master volume. If a node that is part of the master volume is down, the command will still be successful. In a replica pair, the geo-replication session will be active on any of the replica nodes, but remain passive on the others. After executing the command, it may take a few minutes for the session to initialize and become stable. - Verifying the status of geo-replication session by running the following command:
# gluster volume geo-replication master-vol interimhost.com::interimmaster-vol status
- To create a geo-replication session between interimmaster volume and slave volume, repeat step 3 to step 5 replacing
master-volwithinterimmaster-volandinterimmaster-volwithslave- vol. You must run these commands on interimmaster.Here, the interimmaster volume will act as the master volume for the geo-replication session between interimmaster and slave.
12.8. Recommended Practices
Procedure 12.4. Manually Setting the Time on Bricks in a Geo-replication Environment
- Stop geo-replication between the master and slave, using the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop - Stop geo-replication indexing, using the following command:
# gluster volume set MASTER_VOL geo-replication.indexing off - Set a uniform time on all the bricks.
- Restart the geo-replication sessions, using the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start
# gluster volume set SLAVE_VOL batch-fsync-delay-usec 0Procedure 12.5. Initially Replicating to a Remote Slave Locally using a LAN
- Create a geo-replication session locally within the LAN. For information on creating a geo-replication session, see Section 12.3.4, “Configuring the Environment and Creating a Geo-replication Session”.
Important
You must remember the order in which the bricks/disks are specified when creating the slave volume. This information is required later for configuring the remote geo-replication session over the WAN. - Ensure that the initial data on the master is synced to the slave volume. You can verify the status of the synchronization by using the
statuscommand, as shown in Section 12.4.3, “Displaying Geo-replication Status Information”. - Stop and delete the geo-replication session.For information on stopping and deleting the the geo-replication session, see Section 12.4.5, “Stopping a Geo-replication Session” and Section 12.4.6, “Deleting a Geo-replication Session”.
Important
You must ensure that there are no stale files in/var/lib/glusterd/geo-replication/. - Stop and delete the slave volume.For information on stopping and deleting the volume, see Section 10.7, “Stopping Volumes” and Section 10.8, “Deleting Volumes”.
- Remove the disks from the slave nodes, and physically transport them to the remote location. Make sure to remember the order in which the disks were specified in the volume.
- At the remote location, attach the disks and mount them on the slave nodes. Make sure that the file system or logical volume manager is recognized, and that the data is accessible after mounting it.
- Configure a trusted storage pool for the slave using the
peer probecommand.For information on configuring a trusted storage pool, see Chapter 7, Trusted Storage Pools. - Delete the glusterFS-related attributes on the bricks. This should be done before creating the volume. You can remove the glusterFS-related attributes by running the following command:
# for i in `getfattr -d -m . ABSOLUTE_PATH_TO_BRICK 2>/dev/null | grep trusted | awk -F = '{print $1}'`; do setfattr -x $i ABSOLUTE_PATH_TO_BRICK; doneRun the following command to ensure that there are noxattrsstill set on the brick:# getfattr -d -m . ABSOLUTE_PATH_TO_BRICK - After creating the trusted storage pool, create the Red Hat Storage volume with the same configuration that it had when it was on the LAN. For information on creating volumes, see Chapter 8, Red Hat Storage Volumes.
Important
Make sure to specify the bricks in same order as they were previously when on the LAN. A mismatch in the specification of the brick order may lead to data loss or corruption. - Start and mount the volume, and check if the data is intact and accessible.For information on starting and mounting volumes, see Section 8.10, “Starting Volumes ” and Section 9.2.3, “Mounting Red Hat Storage Volumes”.
- Configure the environment and create a geo-replication session from the master to this remote slave.For information on configuring the environment and creating a geo-replication session, see Section 12.3.4, “Configuring the Environment and Creating a Geo-replication Session”.
- Start the geo-replication session between the master and the remote slave.For information on starting the geo-replication session, see Section 12.4, “Starting Geo-replication”.
- Use the
statuscommand to verify the status of the session, and check if all the nodes in the session are stable.For information on thestatus, see Section 12.4.3, “Displaying Geo-replication Status Information”.
12.9. Troubleshooting Geo-replication
12.9.1. Locating Log Files
Master-log-file- log file for the process that monitors the master volume.Slave-log-file- log file for process that initiates changes on a slave.Master-gluster-log-file- log file for the maintenance mount point that the geo-replication module uses to monitor the master volume.Slave-gluster-log-file- If the slave is a Red Hat Storage Volume, this log file is the slave's counterpart ofMaster-gluster-log-file.
12.9.1.1. Master Log Files
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config log-file# gluster volume geo-replication Volume1 example.com::slave-vol config log-file
12.9.1.2. Slave Log Files
glusterd must be running on slave machine.
Procedure 12.6.
- On the master, run the following command to display the session-owner details:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config session-ownerFor example:# gluster volume geo-replication Volume1 example.com::slave-vol config session-owner 5f6e5200-756f-11e0-a1f0-0800200c9a66
- On the slave, run the following command with the session-owner value from the previous step:
# gluster volume geo-replication SLAVE_VOL config log-file /var/log/gluster/SESSION_OWNER:remote-mirror.logFor example:# gluster volume geo-replication slave-vol config log-file /var/log/gluster/5f6e5200-756f-11e0-a1f0-0800200c9a66:remote-mirror.log
12.9.2. Synchronization Is Not Complete
Stable, but the data has not been completely synchronized.
12.9.3. Issues with File Synchronization
Stable, but only directories and symlinks are synchronized. Error messages similar to the following are in the logs:
[2011-05-02 13:42:13.467644] E [master:288:regjob] GMaster: failed to sync ./some_file`
rsync v3.0.0 or higher on the host and the remote machines. Verify if you have installed the required version of rsync.
12.9.4. Geo-replication Status is Often Faulty
Faulty, with a backtrace similar to the following:
012-09-28 14:06:18.378859] E [syncdutils:131:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twraptf(*aa) File "/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in listen rid, exc, res = recv(self.inf) File "/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in recv return pickle.load(inf) EOFError
- Password-less SSH is set up properly between the host and remote machines.
- FUSE is installed on the machines. The geo-replication module mounts Red Hat Storage volumes using FUSE to sync data.
12.9.5. Intermediate Master is in a Faulty State
raise RuntimeError ("aborting on uuid change from %s to %s" % \ RuntimeError: aborting on uuid change from af07e07c-427f-4586-ab9f- 4bf7d299be81 to de6b5040-8f4e-4575-8831-c4f55bd41154volume-id configuration option in the session that was initiated from the intermediate master.
12.9.6. Remote gsyncd Not Found
[2012-04-04 03:41:40.324496] E [resource:169:errfail] Popen: ssh> bash: /usr/local/libexec/glusterfs/gsyncd: No such file or directory
Chapter 13. Managing Directory Quotas
13.1. Enabling Quotas
# gluster volume quota VOLNAME enable
# gluster volume quota test-volume enable volume quota : success
Important
- Do not enable quota using the
volume-setcommand. This option is no longer supported. - Do not enable quota while
quota-remove-xattr.shis still running.
13.2. Setting Limits
Note
# gluster volume status <VOLNAME> status
# gluster volume quota VOLNAME limit-usage path hard_limit
- To set a hard limit of 100GB on
/dir:# gluster volume quota VOLNAME limit-usage /dir 100GB
- To set a hard limit of 1TB for the volume:
# gluster volume quota VOLNAME limit-usage / 1TB
/var/log/glusterfs/bricks/<path-to-brick.log>
# gluster volume quota VOLNAME limit-usage path hard_limit soft_limit
- To set the soft limit to 76% of the hard limit on
/dir:# gluster volume quota VOLNAME limit-usage /dir 100GB 76%
- To set the soft limit to 68% of the hard limit on the volume:
# gluster volume quota VOLNAME limit-usage / 1TB 68%
Note
13.3. Setting the Default Soft Limit
# gluster volume quota VOLNAME default-soft-limit soft_limit
# gluster volume quota test-volume default-soft-limit 90% volume quota : success
# gluster volume quota test-volume list
Note
13.4. Displaying Quota Limit Information
# gluster volume quota VOLNAME list
# gluster volume quota test-volume list Path Hard-limit Soft-limit Used Available ------------------------------------------------------ / 50GB 75% 0Bytes 50.0GB /dir 10GB 75% 0Bytes 10.0GB /dir/dir2 20GB 90% 0Bytes 20.0GB
# gluster volume quota VOLNAME list /<directory_name>
# gluster volume quota test-volume list /dir Path Hard-limit Soft-limit Used Available ------------------------------------------------- /dir 10.0GB 75% 0Bytes 10.0GB
# gluster volume quota VOLNAME list /<directory_name1> /<directory_name2>
# gluster volume quota test-volume list /dir /dir/dir2 Path Hard-limit Soft-limit Used Available ------------------------------------------------------ /dir 10.0GB 75% 0Bytes 10.0GB /dir/dir2 20.0GB 90% 0Bytes 20.0GB
13.4.1. Displaying Quota Limit Information Using the df Utility
df utility, taking quota limits into consideration, run the following command:
# gluster volume set VOLNAME quota-deem-statfs on
Note
quota-deem-statfs is off. However, it is recommended to set quota-deem-statfs to on.
quota-deem-statfs is off:
# gluster volume set test-volume features.quota-deem-statfs off volume set: success # gluster volume quota test-volume list Path Hard-limit Soft-limit Used Available ----------------------------------------------------------- / 300.0GB 90% 11.5GB 288.5GB /John/Downloads 77.0GB 75% 11.5GB 65.5GB
# df -hT /home Filesystem Type Size Used Avail Use% Mounted on server1:/test-volume fuse.glusterfs 400G 12G 389G 3% /home
quota-deem-statfs is on:
# gluster volume set test-volume features.quota-deem-statfs on volume set: success # gluster vol quota test-volume list Path Hard-limit Soft-limit Used Available ----------------------------------------------------------- / 300.0GB 90% 11.5GB 288.5GB /John/Downloads 77.0GB 75% 11.5GB 65.5GB
# df -hT /home Filesystem Type Size Used Avail Use% Mounted on server1:/test-volume fuse.glusterfs 300G 12G 289G 4% /home
quota-deem-statfs option when set to on, allows the administrator to make the user view the total disk space available on the directory as the hard limit set on it.
13.5. Setting Timeout
- Soft timeout is the frequency at which the quota server-side translator checks the volume usage when the usage is below the soft limit. The soft timeout is in effect when the disk usage is less than the soft limit.To set the soft timeout, use the following command:
#gluster volume quota VOLNAME soft-timeout timeNote
The default soft timeout is 60 seconds.For example, to set the soft timeout on test-volume to 1 minute:# gluster volume quota test-volume soft-timeout 1min volume quota : success
- Hard timeout is the frequency at which the quota server-side translator checks the volume usage when the usage is above the soft limit. The hard timeout is in effect when the disk usage is between the soft limit and the hard limit.To set the hard timeout, use the following command:
# gluster volume quota VOLNAME hard-timeout timeNote
The default hard timeout is 5 seconds.For example, to set the hard timeout for 30 seconds:# gluster volume quota test-volume hard-timeout 30s volume quota : success
Note
As the margin of error for disk usage is proportional to the workload of the applications running on the volume, ensure that you set the hard-timeout and soft-timeout taking the workload into account.
13.6. Setting Alert Time
# gluster volume quota VOLNAME alert-time time
Note
# gluster volume quota test-volume alert-time 1d volume quota : success
13.7. Removing Disk Limits
# gluster volume quota VOLNAME remove /<directory-name>
# gluster volume quota test-volume remove /data volume quota : success
# gluster vol quota test-volume remove / volume quota : success
Note
13.8. Disabling Quotas
# gluster volume quota VOLNAME disable
# gluster volume quota test-volume disable Disabling quota will delete all the quota configuration. Do you want to continue? (y/n) y volume quota : success
Note
- When you disable quotas, all previously configured limits are removed from the volume.
- Disabling quotas may take some time. To find out if the operation is in progress, run the following command:
# ps ax | grep "quota-remove-xattr.sh"
The output determines if thequota-remove-xattr.shcommand is running. Ifquota-remove-xattr.shis still running, the disabling of quotas is in progress. - Do not re-enable quota until the
quota-remove-xattr.shprocess is finished.
Chapter 14. Monitoring Your Red Hat Storage Workload
- 14.1. Running the Volume Profile Command
- 14.2. Running the Volume Top Command
- 14.2.1. Viewing Open File Descriptor Count and Maximum File Descriptor Count
- 14.2.2. Viewing Highest File Read Calls
- 14.2.3. Viewing Highest File Write Calls
- 14.2.4. Viewing Highest Open Calls on a Directory
- 14.2.5. Viewing Highest Read Calls on a Directory
- 14.2.6. Viewing Read Performance
- 14.2.7. Viewing Write Performance
- 14.3. Listing Volumes
- 14.4. Displaying Volume Information
- 14.5. Performing Statedump on a Volume
- 14.6. Displaying Volume Status
volume top and volume profile commands to view vital performance information and identify bottlenecks on each brick of a volume.
Note
profile and top information will be reset.
14.1. Running the Volume Profile Command
volume profile command provides an interface to get the per-brick or NFS server I/O information for each File Operation (FOP) of a volume. This information helps in identifying the bottlenecks in the storage system.
volume profile command.
14.1.1. Start Profiling
# gluster volume profile VOLNAME start
# gluster volume profile test-volume start Profiling started on test-volume
Important
profile command can affect system performance while the profile information is being collected. Red Hat recommends that profiling should only be used for debugging.
volume info command:
diagnostics.count-fop-hits: on diagnostics.latency-measurement: on
14.1.2. Displaying the I/O Information
# gluster volume profile VOLNAME info
# gluster volume profile test-volume info
Brick: Test:/export/2
Cumulative Stats:
Block 1b+ 32b+ 64b+
Size:
Read: 0 0 0
Write: 908 28 8
Block 128b+ 256b+ 512b+
Size:
Read: 0 6 4
Write: 5 23 16
Block 1024b+ 2048b+ 4096b+
Size:
Read: 0 52 17
Write: 15 120 846
Block 8192b+ 16384b+ 32768b+
Size:
Read: 52 8 34
Write: 234 134 286
Block 65536b+ 131072b+
Size:
Read: 118 622
Write: 1341 594
%-latency Avg- Min- Max- calls Fop
latency Latency Latency
___________________________________________________________
4.82 1132.28 21.00 800970.00 4575 WRITE
5.70 156.47 9.00 665085.00 39163 READDIRP
11.35 315.02 9.00 1433947.00 38698 LOOKUP
11.88 1729.34 21.00 2569638.00 7382 FXATTROP
47.35 104235.02 2485.00 7789367.00 488 FSYNC
------------------
------------------
Duration : 335
BytesRead : 94505058
BytesWritten : 195571980# gluster volume profile VOLNAME info nfs
# gluster volume profile test-volume info nfs
NFS Server : localhost
----------------------
Cumulative Stats:
Block Size: 32768b+ 65536b+
No. of Reads: 0 0
No. of Writes: 1000 1000
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.01 410.33 us 194.00 us 641.00 us 3 STATFS
0.60 465.44 us 346.00 us 867.00 us 147 FSTAT
1.63 187.21 us 67.00 us 6081.00 us 1000 SETATTR
1.94 221.40 us 58.00 us 55399.00 us 1002 ACCESS
2.55 301.39 us 52.00 us 75922.00 us 968 STAT
2.85 326.18 us 88.00 us 66184.00 us 1000 TRUNCATE
4.47 511.89 us 60.00 us 101282.00 us 1000 FLUSH
5.02 3907.40 us 1723.00 us 19508.00 us 147 READDIRP
25.42 2876.37 us 101.00 us 843209.00 us 1012 LOOKUP
55.52 3179.16 us 124.00 us 121158.00 us 2000 WRITE
Duration: 7074 seconds
Data Read: 0 bytes
Data Written: 102400000 bytes
Interval 1 Stats:
Block Size: 32768b+ 65536b+
No. of Reads: 0 0
No. of Writes: 1000 1000
%-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
--------- ----------- ----------- ----------- ------------ ----
0.01 410.33 us 194.00 us 641.00 us 3 STATFS
0.60 465.44 us 346.00 us 867.00 us 147 FSTAT
1.63 187.21 us 67.00 us 6081.00 us 1000 SETATTR
1.94 221.40 us 58.00 us 55399.00 us 1002 ACCESS
2.55 301.39 us 52.00 us 75922.00 us 968 STAT
2.85 326.18 us 88.00 us 66184.00 us 1000 TRUNCATE
4.47 511.89 us 60.00 us 101282.00 us 1000 FLUSH
5.02 3907.40 us 1723.00 us 19508.00 us 147 READDIRP
25.41 2878.07 us 101.00 us 843209.00 us 1011 LOOKUP
55.53 3179.16 us 124.00 us 121158.00 us 2000 WRITE
Duration: 330 seconds
Data Read: 0 bytes
Data Written: 102400000 bytes14.1.3. Stop Profiling
# gluster volume profile VOLNAME stop
# gluster volume profile test-volume stop Profiling stopped on test-volume
14.2. Running the Volume Top Command
volume top command allows you to view the glusterFS bricks’ performance metrics, including read, write, file open calls, file read calls, file write calls, directory open calls, and directory real calls. The volume top command displays up to 100 results.
volume top command.
14.2.1. Viewing Open File Descriptor Count and Maximum File Descriptor Count
volume top command. The volume top command also displays the maximum open file descriptor count of files that are currently open, and the maximum number of files opened at any given point of time since the servers are up and running. If the brick name is not specified, then the open file descriptor metrics of all the bricks belonging to the volume displays.
# gluster volume top VOLNAME open [nfs | brick BRICK-NAME] [list-cnt cnt]
# gluster volume top test-volume open brick server:/export list-cnt 10
Brick: server:/export/dir1
Current open fd's: 34 Max open fd's: 209
==========Open file stats========
open file name
call count
2 /clients/client0/~dmtmp/PARADOX/
COURSES.DB
11 /clients/client0/~dmtmp/PARADOX/
ENROLL.DB
11 /clients/client0/~dmtmp/PARADOX/
STUDENTS.DB
10 /clients/client0/~dmtmp/PWRPNT/
TIPS.PPT
10 /clients/client0/~dmtmp/PWRPNT/
PCBENCHM.PPT
9 /clients/client7/~dmtmp/PARADOX/
STUDENTS.DB
9 /clients/client1/~dmtmp/PARADOX/
STUDENTS.DB
9 /clients/client2/~dmtmp/PARADOX/
STUDENTS.DB
9 /clients/client0/~dmtmp/PARADOX/
STUDENTS.DB
9 /clients/client8/~dmtmp/PARADOX/
STUDENTS.DB14.2.2. Viewing Highest File Read Calls
volume top command. If the brick name is not specified, a list of 100 files are displayed by default.
# gluster volume top VOLNAME read [nfs | brick BRICK-NAME] [list-cnt cnt]
# gluster volume top test-volume read brick server:/export list-cnt 10
Brick: server:/export/dir1
==========Read file stats========
read filename
call count
116 /clients/client0/~dmtmp/SEED/LARGE.FIL
64 /clients/client0/~dmtmp/SEED/MEDIUM.FIL
54 /clients/client2/~dmtmp/SEED/LARGE.FIL
54 /clients/client6/~dmtmp/SEED/LARGE.FIL
54 /clients/client5/~dmtmp/SEED/LARGE.FIL
54 /clients/client0/~dmtmp/SEED/LARGE.FIL
54 /clients/client3/~dmtmp/SEED/LARGE.FIL
54 /clients/client4/~dmtmp/SEED/LARGE.FIL
54 /clients/client9/~dmtmp/SEED/LARGE.FIL
54 /clients/client8/~dmtmp/SEED/LARGE.FIL14.2.3. Viewing Highest File Write Calls
volume top command. If the brick name is not specified, a list of 100 files displays by default.
# gluster volume top VOLNAME write [nfs | brick BRICK-NAME] [list-cnt cnt]
# gluster volume top test-volume write brick server:/export/ list-cnt 10
Brick: server:/export/dir1
==========Write file stats========
write call count filename
83 /clients/client0/~dmtmp/SEED/LARGE.FIL
59 /clients/client7/~dmtmp/SEED/LARGE.FIL
59 /clients/client1/~dmtmp/SEED/LARGE.FIL
59 /clients/client2/~dmtmp/SEED/LARGE.FIL
59 /clients/client0/~dmtmp/SEED/LARGE.FIL
59 /clients/client8/~dmtmp/SEED/LARGE.FIL
59 /clients/client5/~dmtmp/SEED/LARGE.FIL
59 /clients/client4/~dmtmp/SEED/LARGE.FIL
59 /clients/client6/~dmtmp/SEED/LARGE.FIL
59 /clients/client3/~dmtmp/SEED/LARGE.FIL14.2.4. Viewing Highest Open Calls on a Directory
volume top command. If the brick name is not specified, the metrics of all bricks belonging to that volume displays.
# gluster volume top VOLNAME opendir [brick BRICK-NAME] [list-cnt cnt]
# gluster volume top test-volume opendir brick server:/export/ list-cnt 10
Brick: server:/export/dir1
==========Directory open stats========
Opendir count directory name
1001 /clients/client0/~dmtmp
454 /clients/client8/~dmtmp
454 /clients/client2/~dmtmp
454 /clients/client6/~dmtmp
454 /clients/client5/~dmtmp
454 /clients/client9/~dmtmp
443 /clients/client0/~dmtmp/PARADOX
408 /clients/client1/~dmtmp
408 /clients/client7/~dmtmp
402 /clients/client4/~dmtmp14.2.5. Viewing Highest Read Calls on a Directory
volume top command. If the brick name is not specified, the metrics of all bricks belonging to that volume displays.
# gluster volume top VOLNAME readdir [nfs | brick BRICK-NAME] [list-cnt cnt]
# gluster volume top test-volume readdir brick server:/export/ list-cnt 10 Brick: server:/export/dir1 ==========Directory readdirp stats======== readdirp count directory name 1996 /clients/client0/~dmtmp 1083 /clients/client0/~dmtmp/PARADOX 904 /clients/client8/~dmtmp 904 /clients/client2/~dmtmp 904 /clients/client6/~dmtmp 904 /clients/client5/~dmtmp 904 /clients/client9/~dmtmp 812 /clients/client1/~dmtmp 812 /clients/client7/~dmtmp 800 /clients/client4/~dmtmp
14.2.6. Viewing Read Performance
volume top command. If the brick name is not specified, the metrics of all the bricks belonging to that volume is displayed. The output is the read throughput.
# gluster volume top VOLNAME read-perf [bs blk-size count count] [nfs | brick BRICK-NAME] [list-cnt cnt]
server:/export/ of test-volume, specifying a 256 block size, and list the top 10 results:
# gluster volume top test-volume read-perf bs 256 count 1 brick server:/export/ list-cnt 10
Brick: server:/export/dir1 256 bytes (256 B) copied, Throughput: 4.1 MB/s
==========Read throughput file stats========
read filename Time
through
put(MBp
s)
2912.00 /clients/client0/~dmtmp/PWRPNT/ -2012-05-09
TRIDOTS.POT 15:38:36.896486
2570.00 /clients/client0/~dmtmp/PWRPNT/ -2012-05-09
PCBENCHM.PPT 15:38:39.815310
2383.00 /clients/client2/~dmtmp/SEED/ -2012-05-09
MEDIUM.FIL 15:52:53.631499
2340.00 /clients/client0/~dmtmp/SEED/ -2012-05-09
MEDIUM.FIL 15:38:36.926198
2299.00 /clients/client0/~dmtmp/SEED/ -2012-05-09
LARGE.FIL 15:38:36.930445
2259.00 /clients/client0/~dmtmp/PARADOX/ -2012-05-09
COURSES.X04 15:38:40.549919
2221.00 /clients/client9/~dmtmp/PARADOX/ -2012-05-09
STUDENTS.VAL 15:52:53.298766
2221.00 /clients/client8/~dmtmp/PARADOX/ -2012-05-09
COURSES.DB 15:39:11.776780
2184.00 /clients/client3/~dmtmp/SEED/ -2012-05-09
MEDIUM.FIL 15:39:10.251764
2184.00 /clients/client5/~dmtmp/WORD/ -2012-05-09
BASEMACH.DOC 15:39:09.336572
14.2.7. Viewing Write Performance
volume top command. If brick name is not specified, then the metrics of all the bricks belonging to that volume will be displayed. The output will be the write throughput.
# gluster volume top VOLNAME write-perf [bs blk-size count count] [nfs | brick BRICK-NAME] [list-cnt cnt]
server:/export/ of test-volume, specifying a 256 block size, and list the top 10 results:
# gluster volume top test-volume write-perf bs 256 count 1 brick server:/export/ list-cnt 10
Brick: server:/export/dir1 256 bytes (256 B) copied, Throughput: 2.8 MB/s
==========Write throughput file stats========
write filename Time
throughput
(MBps)
1170.00 /clients/client0/~dmtmp/SEED/ -2012-05-09
SMALL.FIL 15:39:09.171494
1008.00 /clients/client6/~dmtmp/SEED/ -2012-05-09
LARGE.FIL 15:39:09.73189
949.00 /clients/client0/~dmtmp/SEED/ -2012-05-09
MEDIUM.FIL 15:38:36.927426
936.00 /clients/client0/~dmtmp/SEED/ -2012-05-09
LARGE.FIL 15:38:36.933177
897.00 /clients/client5/~dmtmp/SEED/ -2012-05-09
MEDIUM.FIL 15:39:09.33628
897.00 /clients/client6/~dmtmp/SEED/ -2012-05-09
MEDIUM.FIL 15:39:09.27713
885.00 /clients/client0/~dmtmp/SEED/ -2012-05-09
SMALL.FIL 15:38:36.924271
528.00 /clients/client5/~dmtmp/SEED/ -2012-05-09
LARGE.FIL 15:39:09.81893
516.00 /clients/client6/~dmtmp/ACCESS/ -2012-05-09
FASTENER.MDB 15:39:01.797317
14.3. Listing Volumes
# gluster volume list
# gluster volume list test-volume volume1 volume2 volume3
14.4. Displaying Volume Information
# gluster volume info VOLNAME
# gluster volume info test-volume Volume Name: test-volume Type: Distribute Status: Created Number of Bricks: 4 Bricks: Brick1: server1:/exp1 Brick2: server2:/exp2 Brick3: server3:/exp3 Brick4: server4:/exp4
14.5. Performing Statedump on a Volume
- mem - Dumps the memory usage and memory pool details of the bricks.
- iobuf - Dumps iobuf details of the bricks.
- priv - Dumps private information of loaded translators.
- callpool - Dumps the pending calls of the volume.
- fd - Dumps the open file descriptor tables of the volume.
- inode - Dumps the inode tables of the volume.
- history - Dumps the event history of the volume
# gluster volume statedump VOLNAME [nfs] [all|mem|iobuf|callpool|priv|fd|inode|history]
# gluster volume statedump test-volume Volume statedump successful
/var/run/gluster/ directory or in the directory set using server.statedump-path volume option. The naming convention of the dump file is brick-path.brick-pid.dump.
# gluster volume set VOLNAME server.statedump-path path
# gluster volume set test-volume server.statedump-path /usr/local/var/log/glusterfs/dumps/ Set volume successful
# gluster volume info VOLNAME
kill -USR1 process_ID
kill -USR1 4120
14.6. Displaying Volume Status
- detail - Displays additional information about the bricks.
- clients - Displays the list of clients connected to the volume.
- mem - Displays the memory usage and memory pool details of the bricks.
- inode - Displays the inode tables of the volume.
- fd - Displays the open file descriptor tables of the volume.
- callpool - Displays the pending calls of the volume.
# gluster volume status [all|VOLNAME [nfs | shd | BRICKNAME]] [detail |clients | mem | inode | fd |callpool]
# gluster volume status test-volume Status of volume: test-volume Gluster process Port Online Pid ------------------------------------------------------------ Brick arch:/export/rep1 24010 Y 18474 Brick arch:/export/rep2 24011 Y 18479 NFS Server on localhost 38467 Y 18486 Self-heal Daemon on localhost N/A Y 18491
# gluster volume status all
# gluster volume status all Status of volume: test Gluster process Port Online Pid ----------------------------------------------------------- Brick 192.168.56.1:/export/test 24009 Y 29197 NFS Server on localhost 38467 Y 18486 Status of volume: test-volume Gluster process Port Online Pid ------------------------------------------------------------ Brick arch:/export/rep1 24010 Y 18474 Brick arch:/export/rep2 24011 Y 18479 NFS Server on localhost 38467 Y 18486 Self-heal Daemon on localhost N/A Y 18491
# gluster volume status VOLNAME detail
# gluster volume status test-volume detail Status of volume: test-vol ------------------------------------------------------------------------------ Brick : Brick arch:/exp Port : 24012 Online : Y Pid : 18649 File System : ext4 Device : /dev/sda1 Mount Options : rw,relatime,user_xattr,acl,commit=600,barrier=1,data=ordered Inode Size : 256 Disk Space Free : 22.1GB Total Disk Space : 46.5GB Inode Count : 3055616 Free Inodes : 2577164
# gluster volume status VOLNAME clients
# gluster volume status test-volume clients Brick : arch:/export/1 Clients connected : 2 Hostname Bytes Read BytesWritten -------- --------- ------------ 127.0.0.1:1013 776 676 127.0.0.1:1012 50440 51200
# gluster volume status VOLNAME mem
# gluster volume status test-volume mem Memory status for volume : test-volume ---------------------------------------------- Brick : arch:/export/1 Mallinfo -------- Arena : 434176 Ordblks : 2 Smblks : 0 Hblks : 12 Hblkhd : 40861696 Usmblks : 0 Fsmblks : 0 Uordblks : 332416 Fordblks : 101760 Keepcost : 100400 Mempool Stats ------------- Name HotCount ColdCount PaddedSizeof AllocCount MaxAlloc ---- -------- --------- ------------ ---------- -------- test-volume-server:fd_t 0 16384 92 57 5 test-volume-server:dentry_t 59 965 84 59 59 test-volume-server:inode_t 60 964 148 60 60 test-volume-server:rpcsvc_request_t 0 525 6372 351 2 glusterfs:struct saved_frame 0 4096 124 2 2 glusterfs:struct rpc_req 0 4096 2236 2 2 glusterfs:rpcsvc_request_t 1 524 6372 2 1 glusterfs:call_stub_t 0 1024 1220 288 1 glusterfs:call_stack_t 0 8192 2084 290 2 glusterfs:call_frame_t 0 16384 172 1728 6
# gluster volume status VOLNAME inode
# gluster volume status test-volume inode inode tables for volume test-volume ---------------------------------------------- Brick : arch:/export/1 Active inodes: GFID Lookups Ref IA type ---- ------- --- ------- 6f3fe173-e07a-4209-abb6-484091d75499 1 9 2 370d35d7-657e-44dc-bac4-d6dd800ec3d3 1 1 2 LRU inodes: GFID Lookups Ref IA type ---- ------- --- ------- 80f98abe-cdcf-4c1d-b917-ae564cf55763 1 0 1 3a58973d-d549-4ea6-9977-9aa218f233de 1 0 1 2ce0197d-87a9-451b-9094-9baa38121155 1 0 2
# gluster volume status VOLNAME fd
# gluster volume status test-volume fd FD tables for volume test-volume ---------------------------------------------- Brick : arch:/export/1 Connection 1: RefCount = 0 MaxFDs = 128 FirstFree = 4 FD Entry PID RefCount Flags -------- --- -------- ----- 0 26311 1 2 1 26310 3 2 2 26310 1 2 3 26311 3 2 Connection 2: RefCount = 0 MaxFDs = 128 FirstFree = 0 No open fds Connection 3: RefCount = 0 MaxFDs = 128 FirstFree = 0 No open fds
# gluster volume status VOLNAME callpool
# gluster volume status test-volume callpool Pending calls for volume test-volume ---------------------------------------------- Brick : arch:/export/1 Pending calls: 2 Call Stack1 UID : 0 GID : 0 PID : 26338 Unique : 192138 Frames : 7 Frame 1 Ref Count = 1 Translator = test-volume-server Completed = No Frame 2 Ref Count = 0 Translator = test-volume-posix Completed = No Parent = test-volume-access-control Wind From = default_fsync Wind To = FIRST_CHILD(this)->fops->fsync Frame 3 Ref Count = 1 Translator = test-volume-access-control Completed = No Parent = repl-locks Wind From = default_fsync Wind To = FIRST_CHILD(this)->fops->fsync Frame 4 Ref Count = 1 Translator = test-volume-locks Completed = No Parent = test-volume-io-threads Wind From = iot_fsync_wrapper Wind To = FIRST_CHILD (this)->fops->fsync Frame 5 Ref Count = 1 Translator = test-volume-io-threads Completed = No Parent = test-volume-marker Wind From = default_fsync Wind To = FIRST_CHILD(this)->fops->fsync Frame 6 Ref Count = 1 Translator = test-volume-marker Completed = No Parent = /export/1 Wind From = io_stats_fsync Wind To = FIRST_CHILD(this)->fops->fsync Frame 7 Ref Count = 1 Translator = /export/1 Completed = No Parent = test-volume-server Wind From = server_fsync_resume Wind To = bound_xl->fops->fsync
Chapter 15. Managing Red Hat Storage Volume Life-Cycle Extensions
- Creating a volume
- Starting a volume
- Adding a brick
- Removing a brick
- Tuning volume options
- Stopping a volume
- Deleting a volume
Note
15.1. Location of Scripts
- /var/lib/glusterd/hooks/1/create/
- /var/lib/glusterd/hooks/1/delete/
- /var/lib/glusterd/hooks/1/start/
- /var/lib/glusterd/hooks/1/stop/
- /var/lib/glusterd/hooks/1/set/
- /var/lib/glusterd/hooks/1/add-brick/
- /var/lib/glusterd/hooks/1/remove-brick/
--volname=VOLNAME to specify the volume. Command-specific additional arguments are provided for the following volume operations:
- Start volume
--first=yes, if the volume is the first to be started--first=no, for otherwise
- Stop volume
--last=yes, if the volume is to be stopped last.--last=no, for otherwise
- Set volume
-o key=valueFor every key, value is specified in volume set command.
15.2. Prepackaged Scripts
/var/lib/glusterd/hooks/1/start/post and /var/lib/glusterd/hooks/1/stop/pre. By default, the scripts are enabled.
# gluster volume start VOLNAME
S30samba-start.sh script performs the following:
- Adds Samba share configuration details of the volume to the
smb.conffile - Restarts Samba to run with updated configuration
# gluster volume stop VOLNAME
S30samba-stop.sh script performs the following:
- Removes the Samba share details of the volume from the
smb.conffile - Restarts Samba to run with updated configuration
Part III. Red Hat Storage Administration on Public Cloud
Table of Contents
Chapter 16. Launching Red Hat Storage Server for Public Cloud
Note
16.1. Launching Red Hat Storage Instances
- Navigate to the Amazon Web Services home page at http://aws.amazon.com. The Amazon Web Services home page appears.
- Login to Amazon Web Services. The Amazon Web Services main screen is displayed.
- Click Launch Instance.The Choose an AMI screen is displayed.
- Choose an Amazon Machine Image (AMI) and click .
- In the tab, choose an instance type in the corresponding screen and click
- Configure the instance details based on your requirements and click .
- In the Add Storage screen, specify your storage requirements and click .
- In the Tag Instance screen, create a key-value pair and click Next: Configure Security Group.

- In the Configure Security Group screen, add rules to allow specific traffic to reach your instance and click .You must ensure to open the following TCP port numbers in the selected security group:
- 22
- 6000, 6001, 6002, 443, and 8080 ports if Red Hat Storage for OpenStack Swift is enabled
- In the Review Instance Launch screen, review your instance launch details. You can edit changes for each section. Click to complete the launch process.
- In the Select an existing key pair or create a new key pair screen, select the key pair, check the acknowledgement check-box and click .
- The Launch Status screen is displayed. To monitor your instance's status, click .
16.2. Verifying that Red Hat Storage Instance is Running
- In the Amazon EC2 Console Dashboard, click on to check for the instances that are running.
- Select an instance from the list and click .Check the Status column and verify that the instance is running. A yellow circle indicates a status of pending while a green circle indicates that the instance is running.Note the domain name in the Public DNS field. You can use this domain to perform a remote login to the instance
- The Connect To Your Instance screen is displayed. Click Close.
- Using SSH and the domain from the previous step, login to the Red Hat Amazon Machine Image instance. You must use the key pair that was selected or created when launching the instance.Example:Enter the following in command line:
# ssh -i rhs-aws.pem root ec2-23-20-52-123.compute-1.amazonaws.com
- At the command line, enter the following command:
# service glusterd statusVerify that the command indicates that the glusterd daemon is running on the instance.
Chapter 17. Provisioning Storage
Important
Procedure 17.1. To Add Amazon Elastic Block Storage Volumes
- Login to Amazon Web Services at http://aws.amazon.com and select the tab.
- In the select the option to add the Amazon Elastic Block Storage Volumes
- In order to support configuration as a brick, assemble the eight Amazon EBS volumes into a RAID 0 (stripe) array using the following command:
#
mdadm --create ARRAYNAME --level=0 --raid-devices=8 list of all devicesFor example, to create a software RAID 0 of eight volumes:#
mdadm --create /dev/md0 --level=0 --raid-devices=8 /dev/xvdf1 /dev/xvdf2 /dev/xvdf3 /dev/xvdf4 /dev/xvdf5 /dev/xvdf6 /dev/xvdf7 /dev/xvdf8#mdadm --examine --scan > /etc/mdadm.conf - Create a Logical Volume (LV) using the following commands:
#
pvcreate /dev/md0# vgcreate glustervg /dev/md0# vgchange -a y glustervg# lvcreate -a y -l 100%VG -n glusterlv glustervgIn these commands,glustervgis the name of the volume group andglusterlvis the name of the logical volume. Red Hat Storage uses the logical volume created over EBS RAID as a brick. For more information about logical volumes, see the Red Hat Enterprise Linux Logical Volume Manager Administration Guide. - Format the logical volume using the following command:
#
mkfs.xfs -i size=512 DEVICEFor example, to format/dev/glustervg/glusterlv:#
mkfs.xfs -i size=512 /dev/glustervg/glusterlv - Mount the device using the following commands:
#
mkdir -p /export/glusterlv# mount /dev/glustervg/glusterlv /export/glusterlv - Using the following command, add the device to
/etc/fstabso that it mounts automatically when the system reboots:#
echo "/dev/glustervg/glusterlv /export/glusterlv xfs defaults 0 2" >> /etc/fstab
Chapter 18. Stopping and Restarting Red Hat Storage Instance
Part IV. Data Access with Other Interfaces
Table of Contents
- 19. Managing Object Store
- 19.1. Architecture Overview
- 19.2. Components of Object Storage
- 19.3. Advantages of using Object Store
- 19.4. Limitations
- 19.5. Prerequisites
- 19.6. Configuring the Object Store
- 19.6.1. Configuring a Proxy Server
- 19.6.2. Configuring the Authentication Service
- 19.6.3. Configuring an Object Server
- 19.6.4. Configuring a Container Server
- 19.6.5. Configuring an Account Server
- 19.6.6. Configuring Swift Object and Container Constrains
- 19.6.7. Exporting the Red Hat Storage Volumes
- 19.6.8. Starting and Stopping Server
- 19.7. Starting the Services Automatically
- 19.8. Working with the Object Store
Chapter 19. Managing Object Store
- 19.1. Architecture Overview
- 19.2. Components of Object Storage
- 19.3. Advantages of using Object Store
- 19.4. Limitations
- 19.5. Prerequisites
- 19.6. Configuring the Object Store
- 19.6.1. Configuring a Proxy Server
- 19.6.2. Configuring the Authentication Service
- 19.6.3. Configuring an Object Server
- 19.6.4. Configuring a Container Server
- 19.6.5. Configuring an Account Server
- 19.6.6. Configuring Swift Object and Container Constrains
- 19.6.7. Exporting the Red Hat Storage Volumes
- 19.6.8. Starting and Stopping Server
- 19.7. Starting the Services Automatically
- 19.8. Working with the Object Store
19.1. Architecture Overview
- OpenStack Object Storage environment.For detailed information on Object Storage, see OpenStack Object Storage Administration Guide available at: http://docs.openstack.org/admin-guide-cloud/content/ch_admin-openstack-object-storage.html.
- Red Hat Storage environment.Red Hat Storage environment consists of bricks that are used to build volumes. For more information on bricks and volumes, see Section 8.1, “Formatting and Mounting Bricks ”.
19.2. Components of Object Storage
- Authenticate Object Store against an external OpenStack Keystone server.Each Red Hat Storage volume is mapped to a single account. Each account can have multiple users with different privileges based on the group and role they are assigned to. After authenticating using accountname:username and password, user is issued a token which will be used for all subsequent REST requests.Integration with KeystoneWhen you integrate Red Hat Storage Object Store with Keystone authentication, you must ensure that the Swift account name and Red Hat Storage volume name are the same. It is common that Red Hat Storage volumes are created before exposing them through the Red Hat Storage Object Store.When working with Keystone, account names are defined by Keystone as the
tenant id. You must create the Red Hat Storage volume using the Keystonetenant idas the name of the volume. This means, you must create the Keystone tenant before creating a Red Hat Storage Volume.Important
Red Hat Storage does not contain any Keystone server components. It only acts as a Keystone client. After you create a volume for Keystone, ensure to export this volume for accessing it using the object storage interface. For more information on exporting volume, see Section 19.6.7, “Exporting the Red Hat Storage Volumes”.Integration with GSwauthGSwauth is a Web Server Gateway Interface (WGSI) middleware that uses a Red Hat Storage Volume itself as its backing store to maintain its metadata. The benefit in this authentication service is to have the metadata available to all proxy servers and saving the data to a Red Hat Storage volume.To protect the metadata, the Red Hat Storage volume should only be able to be mounted by the systems running the proxy servers. For more information on mounting volumes, see Section 9.2.3, “Mounting Red Hat Storage Volumes”.Integration with TempAuthYou can also use theTempAuthauthentication service to test Red Hat Storage Object Store in the data center.
19.3. Advantages of using Object Store
- Default object size limit of 1 TiB
- Unified view of data across NAS and Object Storage technologies
- High availability
- Scalability
- Replication
- Elastic Volume Management
19.4. Limitations
- Object NameObject Store imposes the following constraints on the object name to maintain the compatibility with network file access:
- Object names must not be prefixed or suffixed by a '/' character. For example,
a/b/ - Object names must not have contiguous multiple '/' characters. For example,
a//b
- Account Management
- Object Store does not allow account management even though OpenStack Swift allows the management of accounts. This limitation is because Object Store treats
accountsequivalent to the Red Hat Storage volumes. - Object Store does not support account names (i.e. Red Hat Storage volume names) having an underscore.
- In Object Store, every account must map to a Red Hat Storage volume.
- Support for
X-Delete-AfterorX-Delete-AtAPIObject Store does not supportX-Delete-AfterorX-Delete-Atheaders API listed at: http://docs.openstack.org/api/openstack-object-storage/1.0/content/Expiring_Objects-e1e3228.html. - Subdirectory ListingHeaders
X-Content-Type: application/directoryandX-Content-Length: 0can be used to create subdirectory objects under a container, but GET request on a subdirectory would not list all the objects under it.
19.5. Prerequisites
memcached service using the following command:
# service memcached start- 6010 - Object Server
- 6011 - Container Server
- 6012 - Account Server
- Proxy server
- 443 - for HTTPS request
- 8080 - for HTTP request
- You must create and mount a Red Hat Storage volume to use it as a Swift Account. For information on creating Red Hat Storage volumes, see Chapter 8, Red Hat Storage Volumes . For information on mounting Red Hat Storage volumes, see Section 9.2.3, “Mounting Red Hat Storage Volumes” .
19.6. Configuring the Object Store
19.6.1. Configuring a Proxy Server
etc/swift/proxy-server.conf by referencing the template file available at /etc/swift/proxy-server.conf-gluster.
19.6.1.1. Configuring a Proxy Server for HTTPS
- Create self-signed cert for SSL using the following commands:
# cd /etc/swift # openssl req -new -x509 -nodes -out cert.crt -keyout cert.key
- Add the following lines to
/etc/swift/proxy-server.confunder [DEFAULT]bind_port = 443 cert_file = /etc/swift/cert.crt key_file = /etc/swift/cert.key
Enabling Distributed Caching with Memcached
memcache_servers configuration option in the proxy-server.conf and list all memcached servers.
proxy-server.conf file.
[filter:cache] use = egg:swift#memcache memcache_servers = 192.168.1.20:11211,192.168.1.21:11211,192.168.1.22:11211
19.6.2. Configuring the Authentication Service
Keystone, GSwauth, and TempAuth authentication services.
19.6.2.1. Integrating with the Keystone Authentication Service
- To configure Keystone, add
authtokenandkeystoneto/etc/swift/proxy-server.confpipeline as shown below:[pipeline:main] pipeline = catch_errors healthcheck proxy-logging cache authtoken keystoneauth proxy-logging proxy-server
- Add the following sections to
/etc/swift/proxy-server.conffile by referencing the example below as a guideline. You must substitute the values according to your setup:[filter:authtoken] paste.filter_factory = keystoneclient.middleware.auth_token:filter_factory signing_dir = /etc/swift auth_host = keystone.server.com auth_port = 35357 auth_protocol = http auth_uri = http://keystone.server.com:5000 # if its defined admin_tenant_name = services admin_user = swift admin_password = adminpassword delay_auth_decision = 1 [filter:keystoneauth] use = egg:swift#keystoneauth operator_roles = admin, SwiftOperator is_admin = true cache = swift.cache
$ swift -V 2 -A http://keystone.server.com:5000/v2.0 -U tenant_name:user -K password stat
19.6.2.2. Integrating with the GSwauth Authentication Service
- Create and start a Red Hat Storage volume to store metadata.
# gluster volume create NEW-VOLNAME NEW-BRICK # gluster volume start NEW-VOLNAME
For example:# gluster volume create gsmetadata server1:/exp1 # gluster volume start gsmetadata
- Run
gluster-swift-gen-builderstool with all the volumes to be accessed using the Swift client includinggsmetadatavolume:# gluster-swift-gen-builders gsmetadata other volumes
- Edit the
/etc/swift/proxy-server.confpipeline as shown below:[pipeline:main] pipeline = catch_errors cache gswauth proxy-server
- Add the following section to
/etc/swift/proxy-server.conffile by referencing the example below as a guideline. You must substitute the values according to your setup.[filter:gswauth] use = egg:gluster_swift#gswauth set log_name = gswauth super_admin_key = gswauthkey metadata_volume = gsmetadata auth_type = sha1 auth_type_salt = swauthsalt
Important
You must ensure to secure theproxy-server.conffile and thesuper_admin_keyoption to prevent unprivileged access. - Restart the proxy server by running the following command:
# swift-init proxy restart
- default-swift-cluster: The default storage-URL for the newly created accounts. When you attempt to authenticate for the first time, the access token and the storage-URL where data for the given account is stored will be returned.
- token_life: The set default token life. The default value is 86400 (24 hours).
- max_token_life: The maximum token life. You can set a token lifetime when requesting a new token with header
x-auth-token-lifetime. If the passed in value is greater than themax_token_life, then themax_token_lifevalue will be used.
- -A, --admin-url: The URL to the auth. The default URL is
http://127.0.0.1:8080/auth/. - -U, --admin-user: The user with administrator rights to perform action. The default user role is
.super_admin. - -K, --admin-key: The key for the user with administrator rights to perform the action. There is no default value.
gswauth to save its metadata by running the following command:
# gswauth-prep [option]
# gswauth-prep -A http://10.20.30.40:8080/auth/ -K gswauthkey
19.6.2.2.1. Managing Account Services in GSwauth
# gswauth-add-account [option] <account_name>
# gswauth-add-account -K gswauthkey <account_name>
# gswauth-delete-account [option] <account_name>
# gswauth-delete-account -K gswauthkey test
reseller admin role only can set the service URL. This command can be used to change the default storage URL for a given account. All accounts will have the same storage-URL as default value, which is set using default-swift-cluster option.
# gswauth-set-account-service [options] <account> <service> <name> <value>
# gswauth-set-account-service -K gswauthkey test storage local http://newhost:8080/v1/AUTH_test
19.6.2.2.2. Managing User Services in GSwauth
- A regular user has no rights. Users must be given both read and write privileges using Swift ACLs.
- The
adminuser is a super-user at the account level. This user can create and delete users for that account. These members will have both write and read privileges to all stored objects in that account. - The
reseller adminuser is a super-user at the cluster level. This user can create and delete accounts and users and has read and write privileges to all accounts under that cluster. - GSwauth maintains its own swift account to store all of its metadata on accounts and users. The
.super_adminrole provides access to GSwauth own swift account and has all privileges to act on any other account or user.
Table 19.1. User Access Matrix
| Role/Group | get list of accounts | get Acccount Details | Create Account | Delete Account | Get User Details | Create admin user | Create reseller_admin user | Create regular user | Delete admin user |
|---|---|---|---|---|---|---|---|---|---|
| .super_admin (username) | X | X | X | X | X | X | X | X | X |
| .reseller_admin (group) | X | X | X | X | X | X | X | X | |
| .admin (group) | X | X | X | X | X | ||||
| regular user (type) |
-r flag to create a reseller admin user and -a flag to create an admin user. To change the password or role of the user, you can run the same command with the new option.
# gswauth-add-user [option] <account_name> <user> <password>
# gswauth-add-user -K gswauthkey -a test ana anapwd
gswauth-delete-user [option] <account_name> <user>
gwauth-delete-user -K gswauthkey test ana
$ swift -A http://127.0.0.1:8080/auth/v1.0 -U test:ana -K anapwd upload container1 README.md
curl -v -H 'X-Storage-User: test:ana' -H 'X-Storage-Pass: anapwd' -k http://localhost:8080/auth/v1.0 ... < X-Auth-Token: AUTH_tk7e68ef4698f14c7f95af07ab7b298610 < X-Storage-Url: http://127.0.0.1:8080/v1/AUTH_test ...
$ swift --os-auth-token=AUTH_tk7e68ef4698f14c7f95af07ab7b298610 --os-storage-url=http://127.0.0.1:8080/v1/AUTH_test upload container1 README.md README.md bash-4.2$ bash-4.2$ swift --os-auth-token=AUTH_tk7e68ef4698f14c7f95af07ab7b298610 --os-storage-url=http://127.0.0.1:8080/v1/AUTH_test list container1 README.md
Important
Reseller admins must always use the second method to acquire a token to get access to other accounts other than his own. The first method of using the username and password will give them access only to their own accounts.
19.6.2.2.3. Managing Accounts and Users Information
# gswauth-list [options] [account] [user]
# gswauth-list -K gswauthkey test ana +----------+ | Groups | +----------+ | test:ana | | test | | .admin | +----------+
- If [account] and [user] are omitted, all the accounts will be listed.
- If [account] is included but not [user], a list of users within that account will be listed.
- If [account] and [user] are included, a list of groups that the user belongs to will be listed.
- If the [user] is .groups, the active groups for that account will be listed.
-p option provides the output in plain text format, -j provides the output in JSON format.
- Change the password of a regular user by running the following command:
# gswauth-add-user -U account1:user1 -K old_passwd account1 user1 new_passwd
- Change the password of an
account administratorby running the following command:# gswauth-add-user -U account1:admin -K old_passwd -a account1 admin new_passwd
- Change the password of the
reseller_adminby running the following command:# gswauth-add-user -U account1:radmin -K old_passwd -r account1 radmin new_passwd
.super_admin role can delete the expired tokens.
# gswauth-cleanup-tokens [options]
# gswauth-cleanup-tokens -K gswauthkey --purge test
- -t, --token-life: The expected life of tokens. The token objects modified before the give number of seconds will be checked for expiration (default: 86400).
- --purge: Purges all the tokens for a given account whether the tokens have expired or not.
- --purge-all: Purges all the tokens for all the accounts and users whether the tokens have expired or not.
19.6.2.3. Integrating with the TempAuth Authentication Service
Warning
cleartext in a single proxy-server.conf file. In your /etc/swift/proxy-server.conf file, enable TempAuth in pipeline and add user information in TempAuth section by referencing the below example.
[pipeline:main] pipeline = catch_errors healthcheck proxy-logging cache tempauth proxy-logging proxy-server [filter:tempauth] use = egg:swift#tempauth user_admin_admin = admin.admin.reseller_admin user_test_tester = testing .admin user_test_tester2 = testing2
user_accountname_username = password [.admin]
accountname is the Red Hat Storage volume used to store objects.
19.6.3. Configuring an Object Server
etc/swift/object.server.conf by referencing the template file available at /etc/swift/object-server.conf-gluster.
19.6.4. Configuring a Container Server
etc/swift/container-server.conf by referencing the template file available at /etc/swift/container-server.conf-gluster.
19.6.5. Configuring an Account Server
etc/swift/account-server.conf by referencing the template file available at /etc/swift/account-server.conf-gluster.
19.6.6. Configuring Swift Object and Container Constrains
/etc/swift/swift.conf by referencing the template file available at /etc/swift/swift.conf-gluster.
19.6.7. Exporting the Red Hat Storage Volumes
Gluster for Swift component.
# cd /etc/swift # gluster-swift-gen-builders VOLUME [VOLUME...]
# cd /etc/swift # gluster-swift-gen-builders testvol1 testvol2 testvol3
/mnt/gluster-object).
gluster-swift-gen-builders tool even if it was previously added. The gluster-swift-gen-builders tool creates new ring files every time it runs successfully.
Important
gluster-swift-gen-builders only with the volumes which are required to be accessed using the Swift interface.
testvol2 volume, run the following command:
# gluster-swift-gen-builders testvol1 testvol3
19.6.8. Starting and Stopping Server
- To start the server, enter the following command:
# swift-init main start - To stop the server, enter the following command:
# swift-init main stop
19.7. Starting the Services Automatically
# chkconfig memcached on# chkconfig openstack-swift-proxy on# chkconfig openstack-swift-account on# chkconfig openstack-swift-container on# chkconfig openstack-swift-object on
Important
19.8. Working with the Object Store
19.8.1. Creating Containers and Objects
19.8.2. Creating Subdirectory under Containers
Content-Type: application/directory and Content-Length: 0. However, the current behavior of Object Store returns 200 OK on a GET request on subdirectory but this does not list all the objects under that subdirectory.
19.8.3. Working with Swift ACLs
Part V. Appendices
Table of Contents
Chapter 20. Troubleshooting
20.1. Managing Red Hat Storage Logs
- Rotating Logs
20.1.1. Rotating Logs
- Rotate the log file using the following command:
# gluster volume log rotate VOLNAMEFor example, to rotate the log file on test-volume:# gluster volume log rotate test-volume log rotate successful
Note
When a log file is rotated, the contents of the current log file are moved to log-file- name.epoch-time-stamp.
20.2. Troubleshooting File Locks
statedump command to list the locks held on files. The statedump output also provides information on each lock with its range, basename, and PID of the application holding the lock, and so on. You can analyze the output to find the locks whose owner/application is no longer running or interested in that lock. After ensuring that no application is using the file, you can clear the lock using the following clear-locks command:
# gluster volume clear-locks VOLNAME path kind {blocked | granted | all}{inode range | entry basename | posix range}
statedump, see Section 14.5, “Performing Statedump on a Volume ”
- Perform
statedumpon the volume to view the files that are locked using the following command:# gluster volume statedump VOLNAMEFor example, to displaystatedumpof test-volume:# gluster volume statedump test-volume Volume statedump successful
Thestatedumpfiles are created on the brick servers in the/tmpdirectory or in the directory set using theserver.statedump-pathvolume option. The naming convention of the dump file isbrick-path.brick-pid.dump. - Clear the entry lock using the following command:
# gluster volume clear-locks VOLNAME path kind granted entry basenameThe following are the sample contents of thestatedumpfile indicating entry lock (entrylk). Ensure that those are stale locks and no resources own them.[xlator.features.locks.vol-locks.inode] path=/ mandatory=0 entrylk-count=1 lock-dump.domain.domain=vol-replicate-0 xlator.feature.locks.lock-dump.domain.entrylk.entrylk[0](ACTIVE)=type=ENTRYLK_WRLCK on basename=file1, pid = 714782904, owner=ffffff2a3c7f0000, transport=0x20e0670, , granted at Mon Feb 27 16:01:01 2012 conn.2.bound_xl./gfs/brick1.hashsize=14057 conn.2.bound_xl./gfs/brick1.name=/gfs/brick1/inode conn.2.bound_xl./gfs/brick1.lru_limit=16384 conn.2.bound_xl./gfs/brick1.active_size=2 conn.2.bound_xl./gfs/brick1.lru_size=0 conn.2.bound_xl./gfs/brick1.purge_size=0
For example, to clear the entry lock onfile1of test-volume:# gluster volume clear-locks test-volume / kind granted entry file1 Volume clear-locks successful test-volume-locks: entry blocked locks=0 granted locks=1
- Clear the inode lock using the following command:
# gluster volume clear-locks VOLNAME path kind granted inode rangeThe following are the sample contents of thestatedumpfile indicating there is an inode lock (inodelk). Ensure that those are stale locks and no resources own them.[conn.2.bound_xl./gfs/brick1.active.1] gfid=538a3d4a-01b0-4d03-9dc9-843cd8704d07 nlookup=1 ref=2 ia_type=1 [xlator.features.locks.vol-locks.inode] path=/file1 mandatory=0 inodelk-count=1 lock-dump.domain.domain=vol-replicate-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 714787072, owner=00ffff2a3c7f0000, transport=0x20e0670, , granted at Mon Feb 27 16:01:01 2012
For example, to clear the inode lock onfile1of test-volume:# gluster volume clear-locks test-volume /file1 kind granted inode 0,0-0 Volume clear-locks successful test-volume-locks: inode blocked locks=0 granted locks=1
- Clear the granted POSIX lock using the following command:
# gluster volume clear-locks VOLNAME path kind granted posix rangeThe following are the sample contents of thestatedumpfile indicating there is a granted POSIX lock. Ensure that those are stale locks and no resources own them.xlator.features.locks.vol1-locks.inode] path=/file1 mandatory=0 posixlk-count=15 posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=8, len=1, pid = 23848, owner=d824f04c60c3c73c, transport=0x120b370, , blocked at Mon Feb 27 16:01:01 2012 , granted at Mon Feb 27 16:01:01 2012 posixlk.posixlk[1](ACTIVE)=type=WRITE, whence=0, start=7, len=1, pid = 1, owner=30404152462d436c-69656e7431, transport=0x11eb4f0, , granted at Mon Feb 27 16:01:01 2012 posixlk.posixlk[2](BLOCKED)=type=WRITE, whence=0, start=8, len=1, pid = 1, owner=30404152462d436c-69656e7431, transport=0x11eb4f0, , blocked at Mon Feb 27 16:01:01 2012 posixlk.posixlk[3](ACTIVE)=type=WRITE, whence=0, start=6, len=1, pid = 12776, owner=a36bb0aea0258969, transport=0x120a4e0, , granted at Mon Feb 27 16:01:01 2012 ...
For example, to clear the granted POSIX lock onfile1of test-volume:# gluster volume clear-locks test-volume /file1 kind granted posix 0,8-1 Volume clear-locks successful test-volume-locks: posix blocked locks=0 granted locks=1 test-volume-locks: posix blocked locks=0 granted locks=1 test-volume-locks: posix blocked locks=0 granted locks=1
- Clear the blocked POSIX lock using the following command:
# gluster volume clear-locks VOLNAME path kind blocked posix rangeThe following are the sample contents of thestatedumpfile indicating there is a blocked POSIX lock. Ensure that those are stale locks and no resources own them.[xlator.features.locks.vol1-locks.inode] path=/file1 mandatory=0 posixlk-count=30 posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=1, pid = 23848, owner=d824f04c60c3c73c, transport=0x120b370, , blocked at Mon Feb 27 16:01:01 2012 , granted at Mon Feb 27 16:01:01 posixlk.posixlk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=1, pid = 1, owner=30404146522d436c-69656e7432, transport=0x1206980, , blocked at Mon Feb 27 16:01:01 2012 posixlk.posixlk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=1, pid = 1, owner=30404146522d436c-69656e7432, transport=0x1206980, , blocked at Mon Feb 27 16:01:01 2012 posixlk.posixlk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=1, pid = 1, owner=30404146522d436c-69656e7432, transport=0x1206980, , blocked at Mon Feb 27 16:01:01 2012 posixlk.posixlk[4](BLOCKED)=type=WRITE, whence=0, start=0, len=1, pid = 1, owner=30404146522d436c-69656e7432, transport=0x1206980, , blocked at Mon Feb 27 16:01:01 2012 ...
For example, to clear the blocked POSIX lock onfile1of test-volume:# gluster volume clear-locks test-volume /file1 kind blocked posix 0,0-1 Volume clear-locks successful test-volume-locks: posix blocked locks=28 granted locks=0 test-volume-locks: posix blocked locks=1 granted locks=0 No locks cleared.
- Clear all POSIX locks using the following command:
# gluster volume clear-locks VOLNAME path kind all posix rangeThe following are the sample contents of thestatedumpfile indicating that there are POSIX locks. Ensure that those are stale locks and no resources own them.[xlator.features.locks.vol1-locks.inode] path=/file1 mandatory=0 posixlk-count=11 posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=8, len=1, pid = 12776, owner=a36bb0aea0258969, transport=0x120a4e0, , blocked at Mon Feb 27 16:01:01 2012 , granted at Mon Feb 27 16:01:01 2012 posixlk.posixlk[1](ACTIVE)=type=WRITE, whence=0, start=0, len=1, pid = 12776, owner=a36bb0aea0258969, transport=0x120a4e0, , granted at Mon Feb 27 16:01:01 2012 posixlk.posixlk[2](ACTIVE)=type=WRITE, whence=0, start=7, len=1, pid = 23848, owner=d824f04c60c3c73c, transport=0x120b370, , granted at Mon Feb 27 16:01:01 2012 posixlk.posixlk[3](ACTIVE)=type=WRITE, whence=0, start=6, len=1, pid = 1, owner=30404152462d436c-69656e7431, transport=0x11eb4f0, , granted at Mon Feb 27 16:01:01 2012 posixlk.posixlk[4](BLOCKED)=type=WRITE, whence=0, start=8, len=1, pid = 23848, owner=d824f04c60c3c73c, transport=0x120b370, , blocked at Mon Feb 27 16:01:01 2012 ...
For example, to clear all POSIX locks onfile1of test-volume:# gluster volume clear-locks test-volume /file1 kind all posix 0,0-1 Volume clear-locks successful test-volume-locks: posix blocked locks=1 granted locks=0 No locks cleared. test-volume-locks: posix blocked locks=4 granted locks=1
statedump on test-volume again to verify that all the above locks are cleared.
Appendix A. Revision History
| Revision History | |||
|---|---|---|---|
| Revision 2.1-109 | Fri Oct 9 2015 | ||
| |||
| Revision 2.1-108 | Wed May 13 2015 | ||
| |||
| Revision 2.1-106 | Mon Mar 09 2015 | ||
| |||
| Revision 2.1-105 | Mon Dec 22 2014 | ||
| |||
| Revision 2.1-104 | Tue Dec 9 2014 | ||
| |||
| Revision 2.1-95 | Thu Sep 18 2014 | ||
| |||
| Revision 2.1-77 | Tue Aug 26 2014 | ||
| |||
| Revision 2.1-75 | Fri Aug 22 2014 | ||
| |||
| Revision 2.1-71 | Tue Aug 12 2014 | ||
| |||
| Revision 2.1-70 | Wed Jun 25 2014 | ||
| |||
| Revision 2.1-69 | Mon Jun 12 2014 | ||
| |||
| Revision 2.1-67 | Mon Mar 17 2014 | ||
| |||
| Revision 2.1-64 | Thu Mar 06 2014 | ||
| |||
| Revision 2.1-63 | Fri Feb 28 2014 | ||
| |||
| Revision 2.1-63 | Tue Feb 24 2014 | ||
| |||
| Revision 2.1-40 | Wed Nov 27 2013 | ||
| |||
| Revision 2.1-37 | Tue Nov 26 2013 | ||
| |||
| Revision 2.1-8 | Wed Sep 25 2013 | ||
| |||
| Revision 2.1-7 | Tue Sep 17 2013 | ||
| |||
| Revision 2.1-4 | Mon Sep 16 2013 | ||
| |||
| Revision 2.1-3 | Mon Sep 16 2013 | ||
| |||
| Revision 2.1-2 | Fri Sep 13 2013 | ||
| |||





























