Disk not properly removed from Node on Azure causing data corruption on persistent volume with OpenShift Container Platform 4 on Azure

Solution Verified - Updated 2024-06-13T18:56:21+00:00 -

Issue

An application reported that data were lost on the volume after redeployment. When checking the below event was reported:

MountVolume.MountDevice failed for volume "pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa": azureDisk - mountDevice:FormatAndMount failed with format of disk "/dev/disk/azure/scsi1/lun0"
    failed: type:("ext4") target:("/var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/mAAAAAAAAAA")
    options:("defaults") errcode:(exit status 1) output:(mke2fs 1.45.6 (20-Mar-2020)
Discarding
    device blocks:    4096/6553600               failed
    - Remote I/O error
Creating filesystem with 6553600 4k blocks and 1638400 inodes
Filesystem
    UUID: bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb
Superblock backups stored on blocks:
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000
Allocating
    group tables:   0/200 done                            
Writing
    inode tables:   0/200 done                            
Creating
    journal (32768 blocks): done
Writing superblocks and filesystem accounting information:
    0/200 mkfs.ext4: Input/output error while writing out and closing file system

During a redeployment of application, the persistent volume of the application was corrupted because a previous volume on the OpenShift - Node was not correctly detached when running on Azure.
When detaching a disk from the OpenShift Node on Azure, the disk is not removed by storvsc but instead a message Invalid packet len is found in the Nodes journal.
Randomly, when detaching a disk from the (Azure) hypervisor, storvsc fails to process the vmbus event sent by the hypervisor and prints only the message Invalid packet len instead of proceeding with the SCSI bus re-scan and the removal of the disk within the Node.

Environment

Red Hat OpenShift Container Platform (RHOCP) before 4.13
Microsoft Azure

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

Disk not properly removed from Node on Azure causing data corruption on persistent volume with OpenShift Container Platform 4 on Azure

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links