Disk not properly removed from Node on Azure causing data corruption on persistent volume with OpenShift Container Platform 4 on Azure
Issue
-
An application reported that data were lost on the volume after redeployment. When checking the below event was reported:
MountVolume.MountDevice failed for volume "pvc-aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa": azureDisk - mountDevice:FormatAndMount failed with format of disk "/dev/disk/azure/scsi1/lun0" failed: type:("ext4") target:("/var/lib/kubelet/plugins/kubernetes.io/azure-disk/mounts/mAAAAAAAAAA") options:("defaults") errcode:(exit status 1) output:(mke2fs 1.45.6 (20-Mar-2020) Discarding device blocks: 4096/6553600 failed - Remote I/O error Creating filesystem with 6553600 4k blocks and 1638400 inodes Filesystem UUID: bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000 Allocating group tables: 0/200 done Writing inode tables: 0/200 done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: 0/200 mkfs.ext4: Input/output error while writing out and closing file system -
During a redeployment of application, the persistent volume of the application was corrupted because a previous volume on the OpenShift - Node was not correctly detached when running on Azure.
- When detaching a disk from the OpenShift Node on Azure, the disk is not removed by
storvscbut instead a messageInvalid packet lenis found in the Nodesjournal. - Randomly, when detaching a disk from the (Azure) hypervisor,
storvscfails to process thevmbusevent sent by the hypervisor and prints only the messageInvalid packet leninstead of proceeding with the SCSI bus re-scan and the removal of the disk within the Node.
Environment
- Red Hat OpenShift Container Platform (RHOCP) before 4.13
- Microsoft Azure
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.