5.4.16.8. Setting a RAID fault policy

LVM RAID handles device failures in an automatic fashion based on the preferences defined by the raid_fault_policy field in the lvm.conf file.
  • If the raid_fault_policy field is set to allocate, the system will attempt to replace the failed device with a spare device from the volume group. If there is no available spare device, this will be reported to the system log.
  • If the raid_fault_policy field is set to warn, the system will produce a warning and the log will indicate that a device has failed. This allows the user to determine the course of action to take.
As long as there are enough devices remaining to support usability, the RAID logical volume will continue to operate.
5.4.16.8.1. The allocate RAID Fault Policy
In the following example, the raid_fault_policy field has been set to allocate in the lvm.conf file. The RAID logical volume is laid out as follows.
# lvs -a -o name,copy_percent,devices my_vg
  LV               Copy%  Devices                                     
  my_lv            100.00 my_lv_rimage_0(0),my_lv_rimage_1(0),my_lv_rimage_2(0)
  [my_lv_rimage_0]        /dev/sde1(1)                                
  [my_lv_rimage_1]        /dev/sdf1(1)                                
  [my_lv_rimage_2]        /dev/sdg1(1)                                
  [my_lv_rmeta_0]         /dev/sde1(0)                                
  [my_lv_rmeta_1]         /dev/sdf1(0)                                
  [my_lv_rmeta_2]         /dev/sdg1(0)
If the /dev/sde device fails, the system log will display error messages.
# grep lvm /var/log/messages 
Jan 17 15:57:18 bp-01 lvm[8599]: Device #0 of raid1 array, my_vg-my_lv, has failed.
Jan 17 15:57:18 bp-01 lvm[8599]: /dev/sde1: read failed after 0 of 2048 at
250994294784: Input/output error
Jan 17 15:57:18 bp-01 lvm[8599]: /dev/sde1: read failed after 0 of 2048 at
250994376704: Input/output error
Jan 17 15:57:18 bp-01 lvm[8599]: /dev/sde1: read failed after 0 of 2048 at 0:
Input/output error
Jan 17 15:57:18 bp-01 lvm[8599]: /dev/sde1: read failed after 0 of 2048 at
4096: Input/output error
Jan 17 15:57:19 bp-01 lvm[8599]: Couldn't find device with uuid
3lugiV-3eSP-AFAR-sdrP-H20O-wM2M-qdMANy.
Jan 17 15:57:27 bp-01 lvm[8599]: raid1 array, my_vg-my_lv, is not in-sync.
Jan 17 15:57:36 bp-01 lvm[8599]: raid1 array, my_vg-my_lv, is now in-sync.
Since the raid_fault_policy field has been set to allocate, the failed device is replaced with a new device from the volume group.
# lvs -a -o name,copy_percent,devices vg
  Couldn't find device with uuid 3lugiV-3eSP-AFAR-sdrP-H20O-wM2M-qdMANy.
  LV            Copy%  Devices                                     
  lv            100.00 lv_rimage_0(0),lv_rimage_1(0),lv_rimage_2(0)
  [lv_rimage_0]        /dev/sdh1(1)                                
  [lv_rimage_1]        /dev/sdf1(1)                                
  [lv_rimage_2]        /dev/sdg1(1)                                
  [lv_rmeta_0]         /dev/sdh1(0)                                
  [lv_rmeta_1]         /dev/sdf1(0)                                
  [lv_rmeta_2]         /dev/sdg1(0)
Note that even though the failed device has been replaced, the display still indicates that LVM could not find the failed device. This is because, although the failed device has been removed from the RAID logical volume, the failed device has not yet been removed from the volume group. To remove the failed device from the volume group, you can execute vgreduce --removemissing VG.
If the raid_fault_policy has been set to allocate but there are no spare devices, the allocation will fail, leaving the logical volume as it is. If the allocation fails, you have the option of fixing the drive, then deactivating and activating the logical volume, as described in Section 5.4.16.8.2, “The warn RAID Fault Policy”. Alternately, you can replace the failed device, as described in Section 5.4.16.9, “Replacing a RAID device”.