How to prevent VMware guest filesystems from becoming read-only during a storage interruption

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7
  • Red Hat Enterprise Linux 8

Issue

Red Hat Enterprise Linux guests in a VMware environment experience read-only filesystems requiring a reboot and filesystem check when the guest datastore is not writable due to a storage interruption.

Resolution

Configuring a VMware guest to use multipath queueing allows the guest to continue operation when the storage is returned to service. The caveat with this approach is that if the storage never returns, the I/O requests remain trapped in the multipath layer and the filesystem will not be able to be unmounted without rebooting.

  • Configure the VMware guest virtual hardware to return a unique SCSI identifier for each virtual disk by setting "disk.EnableUUID = TRUE" in the guest configuration file.
    See Why is scsi_id not returning any output in a VM on VMware ESX 4.0 and above
  • Install multipath:

    # yum install -y device-mapper-multipath
    
  • Configure multipath to queue forever:

    # cat << EOF > /etc/multipath.conf
    defaults {
       user_friendly_names yes
       features "1 queue_if_no_path"
    }
    EOF
    
  • Update the LVM filter to use multipath devices using filter = [ "a|/dev/mapper/mpath|", "r/.*/" ] as follows:

    # egrep '^( )*filter' /etc/lvm/lvm.conf
    # sed -r -i 's/^(( )*filter = )(.+$)/\1\[ "a|\/dev\/mapper\/mpath|", "r\/.*\/" \]/' /etc/lvm/lvm.conf
    # egrep '^( )*filter' /etc/lvm/lvm.conf
    
  • Enable and start multipathd to generate the bindings file in /etc/multipath (ignore the resulting error because sda is already in use):

    RHEL 6:

    # chkconfig multipathd on
    # service multipathd start
    

    RHEL 7:

    # systemctl enable multipathd
    # systemctl start multipathd
    
  • Rebuild the initramfs file with multipath support and then reboot:

    # ls /etc/multipath
    # cat /etc/multipath/bindings
    # cp -v /boot/initramfs-$(uname -r).img{,.bak}
    # dracut --force --hostonly --add multipath 
    # ls -l /boot/initramfs-$(uname-r).img*
    # reboot
    
  • Verify that multipath is active:

    RHEL 6:

    # service multipathd status
    

    RHEL 7:

    # systemctl status multipathd
    
    # multipath -ll
    # pvscan
    # df
    

Additional information about configuring the root filesystem for multipath is available for RHEL 6 and RHEL 7.

NOTE: This procedure is not supported on RHEL 5.

Root Cause

  • A prolonged storage interruption at the VMware hypervisor level will prevent guest I/Os from completing within the designated timeout period.
  • The guest subsequently aborts the I/Os and returns an error back to the filesystem which causes it to transition to read-only state in order to prevent further damage to the filesystem metadata.
  • This may be needed even with VMWare MPIO, as it will return the error to the VM and then retry the IO.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments