Chapter 15. Managing RAID

This chapter describes Redundant Array of Independent Disks (RAID). User can use RAID to store data across multiple drives. It also helps to avoid data loss if a drive has failed.

15.1. Redundant array of independent disks (RAID)

The basic idea behind RAID is to combine multiple devices, such as HDD, SSD or NVMe, into an array to accomplish performance or redundancy goals not attainable with one large and expensive drive. This array of devices appears to the computer as a single logical storage unit or drive.

RAID allows information to be spread across several devices. RAID uses techniques such as disk striping (RAID Level 0), disk mirroring (RAID Level 1), and disk striping with parity (RAID Levels 4, 5 and 6) to achieve redundancy, lower latency, increased bandwidth, and maximized ability to recover from hard disk crashes.

RAID distributes data across each device in the array by breaking it down into consistently-sized chunks (commonly 256K or 512k, although other values are acceptable). Each chunk is then written to a hard drive in the RAID array according to the RAID level employed. When the data is read, the process is reversed, giving the illusion that the multiple devices in the array are actually one large drive.

System Administrators and others who manage large amounts of data would benefit from using RAID technology. Primary reasons to deploy RAID include:

  • Enhances speed
  • Increases storage capacity using a single virtual disk
  • Minimizes data loss from disk failure
  • RAID layout and level online conversion

15.2. RAID types

There are three possible RAID approaches: Firmware RAID, Hardware RAID, and Software RAID.

Firmware RAID

Firmware RAID, also known as ATARAID, is a type of software RAID where the RAID sets can be configured using a firmware-based menu. The firmware used by this type of RAID also hooks into the BIOS, allowing you to boot from its RAID sets. Different vendors use different on-disk metadata formats to mark the RAID set members. The Intel Matrix RAID is a good example of a firmware RAID system.

Hardware RAID

The hardware-based array manages the RAID subsystem independently from the host. It may present multiple devices per RAID array to the host.

Hardware RAID devices may be internal or external to the system. Internal devices commonly consisting of a specialized controller card that handles the RAID tasks transparently to the operating system. External devices commonly connect to the system via SCSI, Fibre Channel, iSCSI, InfiniBand, or other high speed network interconnect and present volumes such as logical units to the system.

RAID controller cards function like a SCSI controller to the operating system, and handle all the actual drive communications. The user plugs the drives into the RAID controller (just like a normal SCSI controller) and then adds them to the RAID controller’s configuration. The operating system will not be able to tell the difference.

Software RAID

Software RAID implements the various RAID levels in the kernel block device code. It offers the cheapest possible solution, as expensive disk controller cards or hot-swap chassis [1] are not required. Software RAID also works with any block storage which are supported by the Linux kernel, such as SATA, SCSI, and NVMe. With today’s faster CPUs, Software RAID also generally outperforms Hardware RAID, unless you use high-end storage devices.

The Linux kernel contains a multiple device (MD) driver that allows the RAID solution to be completely hardware independent. The performance of a software-based array depends on the server CPU performance and load.

Key features of the Linux software RAID stack:

  • Multithreaded design
  • Portability of arrays between Linux machines without reconstruction
  • Backgrounded array reconstruction using idle system resources
  • Hot-swappable drive support
  • Automatic CPU detection to take advantage of certain CPU features such as streaming Single Instruction Multiple Data (SIMD) support
  • Automatic correction of bad sectors on disks in an array
  • Regular consistency checks of RAID data to ensure the health of the array
  • Proactive monitoring of arrays with email alerts sent to a designated email address on important events
  • Write-intent bitmaps which drastically increase the speed of resync events by allowing the kernel to know precisely which portions of a disk need to be resynced instead of having to resync the entire array after a system crash

    Note, that resync is a process to synchronize the data over the devices in the existing RAID to achieve redundancy

  • Resync checkpointing so that if you reboot your computer during a resync, at startup the resync will pick up where it left off and not start all over again
  • The ability to change parameters of the array after installation, which is called reshaping. For example, you can grow a 4-disk RAID5 array to a 5-disk RAID5 array when you have a new device to add. This grow operation is done live and does not require you to reinstall on the new array
  • Reshaping supports changing the number of devices, the RAID algorithm or size of the RAID array type, such as RAID4, RAID5, RAID6 or RAID10
  • Takeover supports RAID level converting, such as RAID0 to RAID6

15.3. RAID levels and linear support

RAID supports various configurations, including levels 0, 1, 4, 5, 6, 10, and linear. These RAID types are defined as follows:

Level 0

RAID level 0, often called "striping," is a performance-oriented striped data mapping technique. This means the data being written to the array is broken down into stripes and written across the member disks of the array, allowing high I/O performance at low inherent cost but provides no redundancy.

Many RAID level 0 implementations will only stripe the data across the member devices up to the size of the smallest device in the array. This means that if you have multiple devices with slightly different sizes, each device will get treated as though it is the same size as the smallest drive. Therefore, the common storage capacity of a level 0 array is equal to the capacity of the smallest member disk in a Hardware RAID or the capacity of smallest member partition in a Software RAID multiplied by the number of disks or partitions in the array.

Level 1

RAID level 1, or "mirroring," has been used longer than any other form of RAID. Level 1 provides redundancy by writing identical data to each member disk of the array, leaving a "mirrored" copy on each disk. Mirroring remains popular due to its simplicity and high level of data availability. Level 1 operates with two or more disks, and provides very good data reliability and improves performance for read-intensive applications but at a relatively high cost. [2]

The storage capacity of the level 1 array is equal to the capacity of the smallest mirrored hard disk in a Hardware RAID or the smallest mirrored partition in a Software RAID. Level 1 redundancy is the highest possible among all RAID types, with the array being able to operate with only a single disk present.

Level 4

Level 4 uses parity [3] concentrated on a single disk drive to protect data. Because the dedicated parity disk represents an inherent bottleneck on all write transactions to the RAID array, level 4 is seldom used without accompanying technologies such as write-back caching, or in specific circumstances where the system administrator is intentionally designing the software RAID device with this bottleneck in mind (such as an array that will have little to no write transactions once the array is populated with data). RAID level 4 is so rarely used that it is not available as an option in Anaconda. However, it could be created manually by the user if truly needed.

The storage capacity of Hardware RAID level 4 is equal to the capacity of the smallest member partition multiplied by the number of partitions minus one. Performance of a RAID level 4 array will always be asymmetrical, meaning reads will outperform writes. This is because writes consume extra CPU and main memory bandwidth when generating parity, and then also consume extra bus bandwidth when writing the actual data to disks because you are writing not only the data, but also the parity. Reads need only read the data and not the parity unless the array is in a degraded state. As a result, reads generate less traffic to the drives and across the buses of the computer for the same amount of data transfer under normal operating conditions.

Level 5

This is the most common type of RAID. By distributing parity across all of an array’s member disk drives, RAID level 5 eliminates the write bottleneck inherent in level 4. The only performance bottleneck is the parity calculation process itself. With modern CPUs and Software RAID, that is usually not a bottleneck at all since modern CPUs can generate parity very fast. However, if you have a sufficiently large number of member devices in a software RAID5 array such that the combined aggregate data transfer speed across all devices is high enough, then this bottleneck can start to come into play.

As with level 4, level 5 has asymmetrical performance, which reads substantially outperforming writes. The storage capacity of RAID level 5 is calculated the same way as with level 4.

Level 6

This is a common level of RAID when data redundancy and preservation, and not performance, are the paramount concerns, but where the space inefficiency of level 1 is not acceptable. Level 6 uses a complex parity scheme to be able to recover from the loss of any two drives in the array. This complex parity scheme creates a significantly higher CPU burden on software RAID devices and also imposes an increased burden during write transactions. As such, level 6 is considerably more asymmetrical in performance than levels 4 and 5.

The total capacity of a RAID level 6 array is calculated similarly to RAID level 5 and 4, except that you must subtract 2 devices (instead of 1) from the device count for the extra parity storage space.

Level 10

This RAID level attempts to combine the performance advantages of level 0 with the redundancy of level 1. It also helps to alleviate some of the space wasted in level 1 arrays with more than 2 devices. With level 10, it is possible for instance to create a 3-drive array configured to store only 2 copies of each piece of data, which then allows the overall array size to be 1.5 times the size of the smallest devices instead of only equal to the smallest device (like it would be with a 3-device, level 1 array). This avoids CPU process usage to calculate parity like with RAID level 6, but it is less space efficient.

The creation of RAID level 10 is not supported during installation. It is possible to create one manually using the command line mdadm tool. For more information on the options and their respective performance trade-offs, see man md.

Linear RAID
Linear RAID is a grouping of drives to create a larger virtual drive. In linear RAID, the chunks are allocated sequentially from one member drive, going to the next drive only when the first is completely filled. This grouping provides no performance benefit, as it is unlikely that any I/O operations split between member drives. Linear RAID also offers no redundancy and decreases reliability; if any one member drive fails, the entire array cannot be used. The capacity is the total of all member disks.

15.4. Linux RAID subsystems

The following subsystems compose RAID in Linux:

15.4.1. Linux Hardware RAID Controller Drivers

Hardware RAID controllers have no specific RAID subsystem in Linux. Because they use special RAID chipsets, hardware RAID controllers come with their own drivers; these drivers allow the system to detect the RAID sets as regular disks.

15.4.2. mdraid

The mdraid subsystem was designed as a software RAID solution for Linux; it is also the preferred solution for software RAID under Linux. This subsystem uses its own metadata format, generally referred to as native MD metadata.

mdraid also supports other metadata formats, known as external metadata. Red Hat Enterprise Linux 8 uses mdraid with external metadata to access ISW / IMSM (Intel firmware RAID) sets and SNIA DDF. mdraid sets are configured and controlled through the mdadm utility.

15.5. Creating software RAID

Follow the steps in this procedure to create a Redundant Arrays of Independent Disks (RAID) device. RAID devices are constructed from multiple storage devices that are arranged to provide increased performance and, in some configurations, greater fault tolerance.

A RAID device is created in one step and disks are added or removed as necessary. You can configure one RAID partition for each physical disk in your system, so the number of disks available to the installation program determines the levels of RAID device available. For example, if your system has two hard drives, you cannot create a RAID 10 device, as it requires a minimum of three separate disks.

Note

On IBM Z, the storage subsystem uses RAID transparently. You do not have to configure software RAID manually.

Prerequisites

  • You have selected two or more disks for installation before RAID configuration options are visible. At least two disks are required to create a RAID device.
  • You have created a mount point. By configuring a mount point, you configure the RAID device.
  • You have selected the Custom radio button on the Installation Destination window.

Procedure

  1. From the left pane of the Manual Partitioning window, select the required partition.
  2. Under the Device(s) section, click Modify. The Configure Mount Point dialog box opens.
  3. Select the disks that you want to include in the RAID device and click Select.
  4. Click the Device Type drop-down menu and select RAID.
  5. Click the File System drop-down menu and select your preferred file system type.
  6. Click the RAID Level drop-down menu and select your preferred level of RAID.
  7. Click Update Settings to save your changes.
  8. Click Done to apply the settings and return to the Installation Summary window.

A message is displayed at the bottom of the window if the specified RAID level requires more disks.

15.6. Creating software RAID after installation

This procedure describes how to create a software Redundant Array of Independent Disks (RAID) on an existing system using mdadm utility.

Prerequisites

Procedure

  1. To create RAID of two block devices with names /dev/sda1 and /dev/sdc1, use the following command:

    # mdadm --create /dev/md0 --level=<level_value> --raid-devices=2 /dev/sda1 /dev/sdc1

    Replace <level_value> to a RAID level option. For more information, see mdadm(8) man page.

  2. Optionally, to check the status of RAID, use the following command:

    # mdadm --detail /dev/md0
  3. Optionally, to observe the detailed information about each RAID device, use the following command:

    # mdadm --examine /dev/sda1 /dev/sdc1
  4. To create a file system on a RAID drive, use the following command:

    # mkfs -t <file-system-name> /dev/md0

    where <file-system-name> is a specific file system that you chose to format the drive with. For more information, see mkfs man page.

  5. To create a mount point for RAID drive and mount it, use the following commands:

    # mkdir /mnt/raid1
    # mount /dev/md0 /mnt/raid1

After you finish the steps above, the RAID is ready to be used.

15.7. Reconfiguring RAID

The section below describes how to modify an existing RAID. To do so, choose one of the methods:

  • Changing RAID attributes (also known as RAID reshape).
  • Converting RAID level (also known as RAID takeover).

15.7.1. Reshaping RAID

This chapter below describes how to reshape RAID. You can choose one of the methods of resizing RAID:

  • Enlarging (extending) RAID.
  • Shrinking RAID.

15.7.1.1. Resizing RAID (extending)

This procedure describes how to enlarge RAID. Assuming /dev/md0 is RAID you want to enlarge.

Prerequisites

  • Enough disk space.
  • The package parted is installed.

Procedure

  1. Extend RAID partitions. To do so, follow the instruction in Resizing a partition documentation.
  2. To extend RAID to the maximum of the partition capacity, use this command:

    # mdadm --grow --size=max /dev/md0

    Note that to determine a specific size, you must write the --size parameter in kB (for example --size=524228).

  3. Increase the size of file system. For more information, check the Managing file systems documentation.

15.7.1.2. Resizing RAID (shrinking)

This procedure describes how to shrink RAID. Assuming /dev/md0 is the RAID you want to shrink to 512 MB.

Prerequisites

  • The package parted is installed.

Procedure

  1. Shrink the file system. To do so, check the Managing file systems documentation.

    Important

    The XFS file system does not support shrinking.

  2. To decrease the RAID to the size of 512 MB, use this command:

    # mdadm --grow --size=524228 /dev/md0

    Note, you must write the --size parameter in kB.

  3. Shrink the partition to the size you need. To do so, follow the instruction in the Resizing a partition documentation.

15.7.2. RAID takeover

This chapter describes supported conversions in RAID and contains procedures to accomplish those conversions.

15.7.2.1. Supported RAID conversions

It is possible to convert from one RAID level to another. This section provides a table that lists supported RAID conversions.

 RAID0RAID1RAID4RAID5RAID6RAID10

RAID0

RAID1

RAID4

RAID5

RAID6

RAID10

For example, you can convert RAID level 0 to RAID level 4, RAID level 5 and RAID level 10

Additional resources

  • For more information about RAID level conversion, read mdadm man page.

15.7.2.2. Converting RAID level

This procedure describes how to convert RAID to a different RAID level. Assuming, you want to convert RAID /dev/md0 level 0 to RAID level 5 and add one more disk /dev/sdd to the array.

Prerequisites

Procedure

  1. To convert the RAID /dev/md0 to RAID level 5, use the following command:

    # mdadm --grow --level=5 -n 3 /dev/md0 --force
  2. To add a new disk to the array, use the following command:

    # mdadm --manage /dev/md0 --add /dev/sdd
  3. To check new details of the converted array, use the following command:

    # mdadm --detail /dev/md0

Additional resources

  • For more information about RAID level conversion, read mdadm man page.

15.8. Converting a root disk to RAID1 after installation

This section describes how to convert a non-RAID root disk to a RAID1 mirror after installing Red Hat Enterprise Linux 8.

On the PowerPC (PPC) architecture, take the following additional steps:

Prerequisites

Procedure

  1. Copy the contents of the PowerPC Reference Platform (PReP) boot partition from /dev/sda1 to /dev/sdb1:

    # dd if=/dev/sda1 of=/dev/sdb1
  2. Update the Prep and boot flag on the first partition on both disks:

    $ parted /dev/sda set 1 prep on
    $ parted /dev/sda set 1 boot on
    
    $ parted /dev/sdb set 1 prep on
    $ parted /dev/sdb set 1 boot on
Note

Running the grub2-install /dev/sda command does not work on a PowerPC machine and returns an error, but the system boots as expected.

15.9. Creating advanced RAID devices

In some cases, you may wish to install the operating system on an array that can not be created after the installation completes. Usually, this means setting up the /boot or root file system arrays on a complex RAID device; in such cases, you may need to use array options that are not supported by Anaconda installer. To work around this, perform the following procedure:

Procedure

  1. Insert the install disk.
  2. During the initial boot up, select Rescue Mode instead of Install or Upgrade. When the system fully boots into Rescue mode, the user will be presented with a command line terminal.
  3. From this terminal, use parted to create RAID partitions on the target hard drives. Then, use mdadm to manually create raid arrays from those partitions using any and all settings and options available. For more information on how to do these, see man parted and man mdadm.
  4. Once the arrays are created, you can optionally create file systems on the arrays as well.
  5. Reboot the computer and select Install or Upgrade to install as normal. As Anaconda installer searches the disks in the system, it will find the pre-existing RAID devices.
  6. When asked about how to use the disks in the system, select Custom Layout and click Next. In the device listing, the pre-existing MD RAID devices will be listed.
  7. Select a RAID device, click Edit and configure its mount point and (optionally) the type of file system it should use (if you did not create one earlier) then click Done. Anaconda will perform the install to this pre-existing RAID device, preserving the custom options you selected when you created it in Rescue Mode.
Note

The limited Rescue Mode of the installer does not include man pages. Both the man mdadm and man md contain useful information for creating custom RAID arrays, and may be needed throughout the workaround. As such, it can be helpful to either have access to a machine with these man pages present, or to print them out prior to booting into Rescue Mode and creating your custom arrays.

15.10. Monitoring RAID

This module describes how to set up the RAID monitoring option with mdadm tool.

Prerequisites

  • The package mdadm is installed
  • The mail service is set up.

Procedure

  1. To create a configuration file for monitoring array you must scan the details and forward the result to /etc/mdadm.conf file. To do so, use the following command:

    # mdadm --detail --scan >> /etc/mdadm.conf

    Note, that ARRAY and MAILADDR are mandatory variables.

  2. Open the configuration file /etc/mdadm.conf with a text editor of your choice.
  3. Add the MAILADDR variable with the mail address for the notification. For example, add new line:

    MAILADDR <example@example.com>

    where example@example.com is an email address to which you want to receive the alerts from the array monitoring.

  4. Save changes in the /etc/mdadm.conf file and close it.

After you complete the steps above, the monitoring system will send the alerts to the email address.

Additional resources

  • For more information, read the mdadm.conf 5 man page.

15.11. Maintaining RAID

This section provides various procedures for RAID maintenance.

15.11.1. Replacing a faulty disk in a RAID

This procedure describes how to replace the faulty disk in a redundant array of independent disks (RAID). Assuming, you have /dev/md0 RAID level 10. In this scenario, the /dev/sdg disk is faulty and you need to replace it with new disk /dev/sdh.

Prerequisites

Procedure

  1. Ensure which disk is failing. To do so, enter the following command:

    # journalctl -k -f

    You will find a message showing you which disk has failed:

    md/raid:md0: Disk failure on sdg, disabling device.
    md/raid:md0: Operation continuing on 5 devices.
  2. Press Ctrl+C on your keyboard to exit the journalctl program.
  3. Add a new disk to the array. To do so, enter the following command:

    # mdadm --manage /dev/md0 --add /dev/sdh
  4. Mark the failed disk as faulty. To do so, enter the following command:

    # mdadm --manage /dev/md0 --fail /dev/sdg
  5. Check if the faulty disk was masked correctly by using the following command:

    # mdadm --detail /dev/md0

    At the end of the last command output you will see information about RAID disks similar to this where disk /dev/sdg has a faulty status:

        Number   Major   Minor   RaidDevice State
           0       8       16        0      active sync   /dev/sdb
           1       8       32        1      active sync   /dev/sdc
           2       8       48        2      active sync   /dev/sdd
           3       8       64        3      active sync   /dev/sde
           4       8       80        4      active sync   /dev/sdf
           6       8      112        5      active sync   /dev/sdh
    
           5       8       96        -      faulty   /dev/sdg
  6. Finally, remove the faulty disk from the array. To do so, enter the following command:

    # mdadm --manage /dev/md0 --remove /dev/sdg
  7. Check RAID details by using following command:

    # mdadm --detail /dev/md0

    At the end of the last command output you will see information about RAID disks similar to this:

        Number   Major   Minor   RaidDevice State
           0       8       16        0      active sync   /dev/sdb
           1       8       32        1      active sync   /dev/sdc
           2       8       48        2      active sync   /dev/sdd
           3       8       64        3      active sync   /dev/sde
           4       8       80        4      active sync   /dev/sdf
           6       8      112        5      active sync   /dev/sdh

After completing the steps above you will have RAID /dev/md0 with a new disk /dev/sdh.

15.11.2. Replacing a broken disk in array

This procedure describes how to replace the broken disk in a redundant array of independent disks (RAID). Assuming, you have /dev/md0 RAID level 6. In this scenario, the /dev/sdb disk has hardware issue and could not be used any longer. You need to replace it with the new disk /dev/sdi.

Prerequisites

  • New disk for replacement.
  • The mdadm package is installed.

Procedure

  1. Check the log message by using the following command:

    # journalctl -k -f

    You will find a message showing you which disk has failed:

    md/raid:md0: Disk failure on sdb, disabling device.
    md/raid:md0: Operation continuing on 5 devices.
  2. Press Ctrl+C on your keyboard to exit the journalctl program.
  3. Add the new disk to the array as a spare one. To do so, enter the following command:

    # mdadm --manage /dev/md0 --add /dev/sdi
  4. Mark the broken disk as faulty. To do so, enter the following command:

    # mdadm --manage /dev/md0 --fail /dev/sdb
  5. Remove the faulty disk from the array. To do so, enter the following command:

    # mdadm --manage /dev/md0 --remove /dev/sdb
  6. Check the status of the array by using the following command:

    # mdadm --detail /dev/md0

    At the end of the last command output you will see information about RAID disks similar to this:

        Number   Major   Minor   RaidDevice State
           7       8      128        0      active sync   /dev/sdi
           1       8       32        1      active sync   /dev/sdc
           2       8       48        2      active sync   /dev/sdd
           3       8       64        3      active sync   /dev/sde
           4       8       80        4      active sync   /dev/sdf
           6       8      112        5      active sync   /dev/sdh

After completing the steps above you will have RAID /dev/md0 with a new disk /dev/sdi.

15.11.3. Resynchronizing RAID disks

This procedure describes how to resynchronize disks in a RAID array. Assuming, you have /dev/md0 RAID.

Prerequisites

  • Package mdadm is installed.

Procedure

  1. To check the array for the failed disks behavior, enter the following command:

    # echo check > /sys/block/md0/md/sync_action

    That action will check the array and write the result into the /sys/block/md0/md/sync_action file.

  2. Open file /sys/block/md0/md/sync_action with the text editor of your choice and see if there is any message about disk synchronization failures.
  3. To resynchronize the disks in the array, enter the following command:

    # echo repair > /sys/block/md0/md/sync_action

    This action will resynchronize the disks in the array and write the result into the /sys/block/md0/md/sync_action file.

  4. To view the synchronization progress, enter the following command:

    # cat /proc/mdstat


[1] A hot-swap chassis allows you to remove a hard drive without having to power-down your system.
[2] RAID level 1 comes at a high cost because you write the same information to all of the disks in the array, provides data reliability, but in a much less space-efficient manner than parity based RAID levels such as level 5. However, this space inefficiency comes with a performance benefit: parity-based RAID levels consume considerably more CPU power in order to generate the parity while RAID level 1 simply writes the same data more than once to the multiple RAID members with very little CPU overhead. As such, RAID level 1 can outperform the parity-based RAID levels on machines where software RAID is employed and CPU resources on the machine are consistently taxed with operations other than RAID activities.
[3] Parity information is calculated based on the contents of the rest of the member disks in the array. This information can then be used to reconstruct data when one disk in the array fails. The reconstructed data can then be used to satisfy I/O requests to the failed disk before it is replaced and to repopulate the failed disk after it has been replaced.