Chapter 16. Managing RAID

You can use a Redundant Array of Independent Disks (RAID) to store data across multiple drives. It can help to avoid data loss if a drive has failed.

16.1. Overview of RAID

In a RAID, multiple devices, such as HDD, SSD, or NVMe are combined into an array to accomplish performance or redundancy goals not achievable with one large and expensive drive. This array of devices appears to the computer as a single logical storage unit or drive.

RAID supports various configurations, including levels 0, 1, 4, 5, 6, 10, and linear. RAID uses techniques such as disk striping (RAID Level 0), disk mirroring (RAID Level 1), and disk striping with parity (RAID Levels 4, 5 and 6) to achieve redundancy, lower latency, increased bandwidth, and maximized ability to recover from hard disk crashes.

RAID distributes data across each device in the array by breaking it down into consistently-sized chunks, commonly 256 KB or 512 KB, although other values are acceptable. It writes these chunks to a hard drive in the RAID array according to the RAID level employed. While reading the data, the process is reversed, giving the illusion that the multiple devices in the array are actually one large drive.

RAID technology is beneficial for those who manage large amounts of data. The following are the primary reasons to deploy RAID:

  • It enhances speed
  • It increases storage capacity using a single virtual disk
  • It minimizes data loss from disk failure
  • The RAID layout and level online conversion

16.2. RAID types

The following are the possible types of RAID:

Firmware RAID
Firmware RAID, also known as ATARAID, is a type of software RAID where the RAID sets can be configured using a firmware-based menu. The firmware used by this type of RAID also hooks into the BIOS, allowing you to boot from its RAID sets. Different vendors use different on-disk metadata formats to mark the RAID set members. The Intel Matrix RAID is an example of a firmware RAID system.
Hardware RAID

A hardware-based array manages the RAID subsystem independently from the host. It might present multiple devices per RAID array to the host.

Hardware RAID devices might be internal or external to the system. Internal devices commonly consists of a specialized controller card that handles the RAID tasks transparently to the operating system. External devices commonly connect to the system via SCSI, Fibre Channel, iSCSI, InfiniBand, or other high speed network interconnect and present volumes such as logical units to the system.

RAID controller cards function like a SCSI controller to the operating system and handle all the actual drive communications. You can plug the drives into the RAID controller similar to a normal SCSI controller and then add them to the RAID controller’s configuration. The operating system will not be able to tell the difference.

Software RAID

A software RAID implements the various RAID levels in the kernel block device code. It offers the cheapest possible solution because expensive disk controller cards or hot-swap chassis are not required. With hot-swap chassis, you can remove a hard drive without powering off your system. Software RAID also works with any block storage, which are supported by the Linux kernel, such as SATA, SCSI, and NVMe. With today’s faster CPUs, Software RAID also generally outperforms hardware RAID, unless you use high-end storage devices.

Since the Linux kernel contains a multiple device (MD) driver, the RAID solution becomes completely hardware independent. The performance of a software-based array depends on the server CPU performance and load.

The following are the key features of the Linux software RAID stack:

  • Multithreaded design
  • Portability of arrays between Linux machines without reconstruction
  • Backgrounded array reconstruction using idle system resources
  • Hot-swap drive support
  • Automatic CPU detection to take advantage of certain CPU features such as streaming Single Instruction Multiple Data (SIMD) support.
  • Automatic correction of bad sectors on disks in an array.
  • Regular consistency checks of RAID data to ensure the health of the array.
  • Proactive monitoring of arrays with email alerts sent to a designated email address on important events.
  • Write-intent bitmaps, which drastically increase the speed of resync events by allowing the kernel to know precisely which portions of a disk need to be resynced instead of having to resync the entire array after a system crash.

    Note

    The resync is a process to synchronize the data over the devices in the existing RAID to achieve redundancy.

  • Resync checkpointing so that if you reboot your computer during a resync, at startup the resync resumes where it left off and not starts all over again.
  • The ability to change parameters of the array after installation, which is called reshaping. For example, you can grow a 4-disk RAID5 array to a 5-disk RAID5 array when you have a new device to add. This grow operation is done live and does not require you to reinstall on the new array.
  • Reshaping supports changing the number of devices, the RAID algorithm or size of the RAID array type, such as RAID4, RAID5, RAID6, or RAID10.
  • Takeover supports RAID level conversion, such as RAID0 to RAID6.
  • Cluster MD, which is a storage solution for a cluster, provides the redundancy of RAID1 mirroring to the cluster. Currently, only RAID1 is supported.

16.3. RAID levels and linear support

The following are the supported configurations by RAID, including levels 0, 1, 4, 5, 6, 10, and linear:

Level 0

RAID level 0, often called striping, is a performance-oriented striped data mapping technique. This means the data being written to the array is broken down into stripes and written across the member disks of the array, allowing high I/O performance at low inherent cost but provides no redundancy.

RAID level 0 implementations only stripe the data across the member devices up to the size of the smallest device in the array. This means that if you have multiple devices with slightly different sizes, each device gets treated as though it was the same size as the smallest drive. Therefore, the common storage capacity of a level 0 array is the total capacity of all disks. If the member disks have a different size, then the RAID0 uses all the space of those disks using the available zones.

Level 1

RAID level 1, or mirroring, provides redundancy by writing identical data to each member disk of the array, leaving a mirrored copy on each disk. Mirroring remains popular due to its simplicity and high level of data availability. Level 1 operates with two or more disks, and provides very good data reliability and improves performance for read-intensive applications but at relatively high costs.

RAID level 1 is costly because you write the same information to all of the disks in the array, which provides data reliability, but in a much less space-efficient manner than parity based RAID levels such as level 5. However, this space inefficiency comes with a performance benefit, which is parity-based RAID levels that consume considerably more CPU power in order to generate the parity while RAID level 1 simply writes the same data more than once to the multiple RAID members with very little CPU overhead. As such, RAID level 1 can outperform the parity-based RAID levels on machines where software RAID is employed and CPU resources on the machine are consistently taxed with operations other than RAID activities.

The storage capacity of the level 1 array is equal to the capacity of the smallest mirrored hard disk in a hardware RAID or the smallest mirrored partition in a software RAID. Level 1 redundancy is the highest possible among all RAID types, with the array being able to operate with only a single disk present.

Level 4

Level 4 uses parity concentrated on a single disk drive to protect data. Parity information is calculated based on the content of the rest of the member disks in the array. This information can then be used to reconstruct data when one disk in the array fails. The reconstructed data can then be used to satisfy I/O requests to the failed disk before it is replaced and to repopulate the failed disk after it has been replaced.

Since the dedicated parity disk represents an inherent bottleneck on all write transactions to the RAID array, level 4 is seldom used without accompanying technologies such as write-back caching. Or it is used in specific circumstances where the system administrator is intentionally designing the software RAID device with this bottleneck in mind such as an array that has little to no write transactions once the array is populated with data. RAID level 4 is so rarely used that it is not available as an option in Anaconda. However, it could be created manually by the user if needed.

The storage capacity of hardware RAID level 4 is equal to the capacity of the smallest member partition multiplied by the number of partitions minus one. The performance of a RAID level 4 array is always asymmetrical, which means reads outperform writes. This is because write operations consume extra CPU resources and main memory bandwidth when generating parity, and then also consume extra bus bandwidth when writing the actual data to disks because you are not only writing the data, but also the parity. Read operations need only read the data and not the parity unless the array is in a degraded state. As a result, read operations generate less traffic to the drives and across the buses of the computer for the same amount of data transfer under normal operating conditions.

Level 5

This is the most common type of RAID. By distributing parity across all the member disk drives of an array, RAID level 5 eliminates the write bottleneck inherent in level 4. The only performance bottleneck is the parity calculation process itself. Modern CPUs can calculate parity very fast. However, if you have a large number of disks in a RAID 5 array such that the combined aggregate data transfer speed across all devices is high enough, parity calculation can be a bottleneck.

Level 5 has asymmetrical performance, and reads substantially outperforming writes. The storage capacity of RAID level 5 is calculated the same way as with level 4.

Level 6

This is a common level of RAID when data redundancy and preservation, and not performance, are the paramount concerns, but where the space inefficiency of level 1 is not acceptable. Level 6 uses a complex parity scheme to be able to recover from the loss of any two drives in the array. This complex parity scheme creates a significantly higher CPU burden on software RAID devices and also imposes an increased burden during write transactions. As such, level 6 is considerably more asymmetrical in performance than levels 4 and 5.

The total capacity of a RAID level 6 array is calculated similarly to RAID level 5 and 4, except that you must subtract two devices instead of one from the device count for the extra parity storage space.

Level 10

This RAID level attempts to combine the performance advantages of level 0 with the redundancy of level 1. It also reduces some of the space wasted in level 1 arrays with more than two devices. With level 10, it is possible, for example, to create a 3-drive array configured to store only two copies of each piece of data, which then allows the overall array size to be 1.5 times the size of the smallest devices instead of only equal to the smallest device, similar to a 3-device, level 1 array. This avoids CPU process usage to calculate parity similar to RAID level 6, but it is less space efficient.

The creation of RAID level 10 is not supported during installation. It is possible to create one manually after installation.

Linear RAID

Linear RAID is a grouping of drives to create a larger virtual drive.

In linear RAID, the chunks are allocated sequentially from one member drive, going to the next drive only when the first is completely filled. This grouping provides no performance benefit, as it is unlikely that any I/O operations split between member drives. Linear RAID also offers no redundancy and decreases reliability. If any one member drive fails, the entire array cannot be used and data can be lost. The capacity is the total of all member disks.

16.4. Linux RAID subsystems

The following subsystems compose RAID in Linux:

Linux Hardware RAID Controller Drivers
Hardware RAID controllers have no specific RAID subsystem in Linux. Since they use special RAID chipsets, hardware RAID controllers come with their own drivers. With these drivers, the system detects the RAID sets as regular disks.
mdraid

The mdraid subsystem was designed as a software RAID solution for Linux. It is also the preferred solution for software RAID in Red Hat Enterprise Linux. This subsystem uses its own metadata format, which is referred to as native MD metadata.

It also supports other metadata formats, known as external metadata. Red Hat Enterprise Linux 9 uses mdraid with external metadata to access Intel Rapid Storage (ISW) or Intel Matrix Storage Manager (IMSM) sets and Storage Networking Industry Association (SNIA) Disk Drive Format (DDF). The mdraid subsystem sets are configured and controlled through the mdadm utility.

16.5. Creating a software RAID during the installation

Redundant Arrays of Independent Disks (RAID) devices are constructed from multiple storage devices that are arranged to provide increased performance and, in some configurations, greater fault tolerance.

A RAID device is created in one step and disks are added or removed as necessary. You can configure one RAID partition for each physical disk in your system, so that the number of disks available to the installation program determines the levels of RAID device available. For example, if your system has two hard drives, you cannot create a RAID 10 device, as it requires a minimum of three separate disks.

Note

On 64-bit IBM Z, the storage subsystem uses RAID transparently. You do not have to configure software RAID manually.

Prerequisites

  • You have selected two or more disks for installation before RAID configuration options are visible. Depending on the RAID type you want to create, at least two disks are required.
  • You have created a mount point. By configuring a mount point, you can configure the RAID device.
  • You have selected the Custom radio button on the Installation Destination window.

Procedure

  1. From the left pane of the Manual Partitioning window, select the required partition.
  2. Under the Device(s) section, click Modify. The Configure Mount Point dialog box opens.
  3. Select the disks that you want to include in the RAID device and click Select.
  4. Click the Device Type drop-down menu and select RAID.
  5. Click the File System drop-down menu and select your preferred file system type.
  6. Click the RAID Level drop-down menu and select your preferred level of RAID.
  7. Click Update Settings to save your changes.
  8. Click Done to apply the settings to return to the Installation Summary window.

16.6. Creating a software RAID on an installed system

You can create a software Redundant Array of Independent Disks (RAID) on an existing system using the mdadm utility.

Prerequisites

Procedure

  1. Create a RAID of two block devices, for example /dev/sda1 and /dev/sdc1:

    # mdadm --create /dev/md0 --level=2 --raid-devices=2 /dev/sda1 /dev/sdc1
    mdadm: Defaulting to version 1.2 metadata
    mdadm: array /dev/md0 started.

    The level_value option defines the RAID level.

  2. Optional: Check the status of the RAID:

    # mdadm --detail /dev/md0
    /dev/md0:
               Version : 1.2
         Creation Time : Thu Oct 13 15:17:39 2022
            Raid Level : raid0
            Array Size : 18649600 (17.79 GiB 19.10 GB)
          Raid Devices : 2
         Total Devices : 2
           Persistence : Superblock is persistent
    
           Update Time : Thu Oct 13 15:17:39 2022
                 State : clean
        Active Devices : 2
       Working Devices : 2
        Failed Devices : 0
         Spare Devices : 0
    [...]
  3. Optional: Observe the detailed information about each device in the RAID:

    # mdadm --examine /dev/sda1 /dev/sdc1
    /dev/sda1:
              Magic : a92b4efc
            Version : 1.2
        Feature Map : 0x1000
         Array UUID : 77ddfb0a:41529b0e:f2c5cde1:1d72ce2c
               Name : 0
      Creation Time : Thu Oct 13 15:17:39 2022
         Raid Level : raid0
       Raid Devices : 2
    [...]
  4. Create a file system on the RAID drive:

    # mkfs -t xfs /dev/md0

    Replace xfs with the file system that you chose to format the drive with.

  5. Create a mount point for RAID drive and mount it:

    # mkdir /mnt/raid1
    # mount /dev/md0 /mnt/raid1

    Replace /mnt/raid1 with the mount point.

    If you want that RHEL mounts the md0 RAID device automatically when the system boots, add an entry for your device to the /etc/fstab file:

    /dev/md0   /mnt/raid1 xfs  defaults   0 0

16.7. Configuring a RAID volume using the storage System Role

With the storage System Role, you can configure a RAID volume on RHEL using Red Hat Ansible Automation Platform and Ansible-Core. Create an Ansible playbook with the parameters to configure a RAID volume to suit your requirements.

Prerequisites

  • The Ansible Core package is installed on the control machine.
  • You have the rhel-system-roles package installed on the system from which you want to run the playbook.
  • You have an inventory file detailing the systems on which you want to deploy a RAID volume using the storage System Role.

Procedure

  1. Create a new playbook.yml file with the following content:

    ---
    - name: Configure the storage
      hosts: managed-node-01.example.com
      tasks:
      - name: Create a RAID on sdd, sde, sdf, and sdg
        include_role:
          name: rhel-system-roles.storage
        vars:
        storage_safe_mode: false
        storage_volumes:
          - name: data
            type: raid
            disks: [sdd, sde, sdf, sdg]
            raid_level: raid0
            raid_chunk_size: 32 KiB
            mount_point: /mnt/data
            state: present
    Warning

    Device names might change in certain circumstances, for example, when you add a new disk to a system. Therefore, to prevent data loss, do not use specific disk names in the playbook.

  2. Optional: Verify the playbook syntax:

    # ansible-playbook --syntax-check playbook.yml
  3. Run the playbook:

    # ansible-playbook -i inventory.file /path/to/file/playbook.yml

Additional resources

16.8. Extending RAID

You can extend a RAID using the --grow option of the mdadm utility.

Prerequisites

  • Enough disk space.
  • The parted package is installed.

Procedure

  1. Extend RAID partitions. For more information, see Resizing a partition with parted.
  2. Extend RAID to the maximum of the partition capacity:

    # mdadm --grow --size=max /dev/md0

    To set a specific size, write the value of the --size parameter in kB, for example --size=524228.

  3. Increase the size of file system. For example, if the volume uses XFS and is mounted to /mnt/, enter:

    # xfs_growfs /mnt/

Additional resources

16.9. Shrinking RAID

You can shrink RAID using the --grow option of the mdadm utility.

Important

The XFS file system does not support shrinking.

Prerequisites

  • The parted package is installed.

Procedure

  1. Shrink the file system. For more information, see Managing file systems.
  2. Decrease the RAID to the size, for example to 512 MB:

    # mdadm --grow --size=524228 /dev/md0

    Write the --size parameter in kB.

  3. Shrink the partition to the size you need.

Additional resources

16.10. Supported RAID conversions

It is possible to convert from one RAID level to another. For example, you can convert from RAID5 to RAID10, but not from RAID10 to RAID5. The following table describes the supported RAID conversions:

Source levelDestination level

RAID0

RAID4, RAID5, RAID10

RAID1

RAID0, RAID5

RAID4

RAID0, RAID5

RAID5

RAID0, RAID1, RAID4, RAID6, RAID10

RAID6

RAID5

RAID10

RAID0

Note

Converting RAID 5 to RAID0 and RAID4 is only possible with the ALGORITHM_PARITY_N layout.

Additional resources.

  • The mdadm(8) man page

16.11. Converting a RAID level

You can convert RAID to a different RAID level as required. The following example converts the RAID device /dev/md0 with level 0 to 5 and add one more disk /dev/sdd to the array.

Prerequisites

  • Enough disks for conversion.
  • The mdadm package is installed.
  • Ensure the intended conversion is supported. See Supported RAID conversions.

Procedure

  1. Convert the RAID /dev/md0 to RAID level 5:

    # mdadm --grow --level=5 -n 3 /dev/md0 --force
  2. Add a new disk to the array:

    # mdadm --manage /dev/md0 --add /dev/sdd

Verification

  • Verify if the RAID level is converted:

    # mdadm --detail /dev/md0
    /dev/md0:
               Version : 1.2
         Creation Time : Thu Oct 13 15:17:39 2022
            Raid Level : raid0
            Array Size : 18649600 (17.79 GiB 19.10 GB)
          Raid Devices : 5
    [...]

Additional resources

  • The mdadm(8) man page

16.12. Converting a root disk to RAID1 after installation

This section describes how to convert a non-RAID root disk to a RAID1 mirror after installing Red Hat Enterprise Linux 9.

On the PowerPC (PPC) architecture, take the following additional steps:

Prerequisites

Procedure

  1. Copy the contents of the PowerPC Reference Platform (PReP) boot partition from /dev/sda1 to /dev/sdb1:

    # dd if=/dev/sda1 of=/dev/sdb1
  2. Update the prep and boot flag on the first partition on both disks:

    $ parted /dev/sda set 1 prep on
    $ parted /dev/sda set 1 boot on
    
    $ parted /dev/sdb set 1 prep on
    $ parted /dev/sdb set 1 boot on
Note

Executing the grub2-install /dev/sda command does not work on a PowerPC machine and returns an error, but the system boots as expected.

16.13. Creating advanced RAID devices

In some cases, you might want to install the operating system on an array that is created before the installation completes. Usually, this means setting up the /boot or root file system arrays on a complex RAID device. In such cases, you might need to use array options that are not supported by the Anaconda installer. To work around this, perform the following steps.

Note

The limited Rescue Mode of the installer does not include man pages. Both the mdadm and md man pages contain useful information for creating custom RAID arrays, and might be needed throughout the workaround.

Procedure

  1. Insert the install disk.
  2. During the initial boot up, select Rescue Mode instead of Install or Upgrade. When the system fully boots into Rescue mode, you can see the command line terminal.
  3. From this terminal, execute the following commands:

    1. Create RAID partitions on the target hard drives by using the parted command.
    2. Manually create raid arrays by using the mdadm command from those partitions using any and all settings and options available.
  4. Optional: After creating arrays, create file systems on the arrays as well.
  5. Reboot the computer and select Install or Upgrade to install. As the Anaconda installer searches the disks in the system, it finds the pre-existing RAID devices.
  6. When asked about how to use the disks in the system, select Custom Layout and click Next. In the device listing, the pre-existing MD RAID devices are listed.
  7. Select a RAID device and click Edit.
  8. Configure its mount point and optionally the type of file system it should use if you did not create one earlier, and then click Done. Anaconda installs to this pre-existing RAID device, preserving the custom options you selected when you created it in Rescue Mode.

16.14. Setting up email notifications to monitor a RAID

You can set up email alerts to monitor RAID with the mdadm tool. Once the MAILADDR variable is set to the required email address, the monitoring system sends the alerts to the added email address.

Prerequisites

  • The mdadm package is installed.
  • The mail service is set up.

Procedure

  1. Create the /etc/mdadm.conf configuration file for monitoring array by scanning the RAID details:

    # mdadm --detail --scan >> /etc/mdadm.conf

    Note, that ARRAY and MAILADDR are mandatory variables.

  2. Open the /etc/mdadm.conf configuration file with a text editor of your choice and add the MAILADDR variable with the mail address for the notification. For example, add new line:

    MAILADDR example@example.com>

    Here, example@example.com is an email address to which you want to receive the alerts from the array monitoring.

  3. Save changes in the /etc/mdadm.conf file and close it.

Additional resources

  • The mdadm.conf(5) man page

16.15. Replacing a failed disk in RAID

You can reconstruct the data from the failed disks using the remaining disks. RAID level and the total number of disks determines the minimum amount of remaining disks needed for a successful data reconstruction.

In this procedure, the /dev/md0 RAID contains four disks. The /dev/sdd disk has failed and you need to replace it with the /dev/sdf disk.

Prerequisites

  • A spare disk for replacement.
  • The mdadm package is installed.

Procedure

  1. Check the failed disk:

    1. View the kernel logs:

      # journalctl -k -f
    2. Search for a message similar to the following:

      md/raid:md0: Disk failure on sdd, disabling device.
      
      md/raid:md0: Operation continuing on 3 devices.
    3. Press Ctrl+C on your keyboard to exit the journalctl program.
  2. Mark the failed disk as faulty:

    # mdadm --manage /dev/md0 --fail /dev/sdd
  3. Optional: Check if the failed disk was marked correctly:

    # mdadm --detail /dev/md0

    At the end of the output is a list of disks in the /dev/md0 RAID where the disk /dev/sdd has the faulty status:

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       -       0        0        2      removed
       3       8       64        3      active sync   /dev/sde
    
       2       8       48        -      faulty   /dev/sdd
  4. Remove the failed disk from the RAID:

    # mdadm --manage /dev/md0 --remove /dev/sdd
    Warning

    If your RAID cannot withstand another disk failure, do not remove any disk until the new disk has the active sync status. You can monitor the progress using the watch cat /proc/mdstat command.

  5. Add the new disk to the RAID:

    # mdadm --manage /dev/md0 --add /dev/sdf

    The /dev/md0 RAID now includes the new disk /dev/sdf and the mdadm service will automatically starts copying data to it from other disks.

Verification

  • Check the details of the array:

    # mdadm --detail /dev/md0

    If this command shows a list of disks in the /dev/md0 RAID where the new disk has spare rebuilding status at the end of the output, data is still being copied to it from other disks:

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       4       8       80        2      spare rebuilding   /dev/sdf
       3       8       64        3      active sync   /dev/sde

    After data copying is finished, the new disk has an active sync status.

16.16. Repairing RAID disks

This procedure describes how to repair disks in a RAID array.

Prerequisites

  • The mdadm package is installed.

Procedure

  1. Check the array for the failed disks behavior:

    # echo check > /sys/block/md0/md/sync_action

    This checks the array and the /sys/block/md0/md/sync_action file shows the sync action.

  2. Open the /sys/block/md0/md/sync_action file with the text editor of your choice and see if there is any message about disk synchronization failures.
  3. View the /sys/block/md0/md/mismatch_cnt file. If the mismatch_cnt parameter is not 0, it means that the RAID disks need repair.
  4. Repair the disks in the array:

    # echo repair > /sys/block/md0/md/sync_action

    This repairs the disks in the array and writes the result into the /sys/block/md0/md/sync_action file.

  5. View the synchronization progress:

    # cat /sys/block/md0/md/sync_action
    repair
    
    # cat /proc/mdstat
    Personalities : [raid0] [raid6] [raid5] [raid4] [raid1]
    md0 : active raid1 sdg[1] dm-3[0]
          511040 blocks super 1.2 [2/2] [UU]
    unused devices: <none>