FibreChannel storage and sanlock issues occur after upgrading VDSM.

Environment

Red Hat Enterprise Virtualization (RHEV) 3.4
Red Hat Enterprise Linux (RHEL) 6.5 and Red Hat Enterprise Virtualization Hypervisors (RHEV-H ) 6.5
- vdsm-4.14.13-2
- vdsm-4.14.17-1

Issue

FibreChannel (FC) storage connections became unstable after a vdsm upgrade.
Hosts report latency errors and FC interfaces are flapping.
This can be triggered by putting a host in Maintenance and activating it, or assigning a LUN to a VM from the SPM. All other hosts will report latency errors.
Root filesystem mounts to read-only mode in boot from SAN environments .

Resolution

The Bug has been fixed in Errata RHBA-2014-1946 .
This fix has been included in rhev-hypervisor6-6.5-20150115.0.el6ev.noarch.rpm .
As a workaround , it is possible to revert to a prior version of VDSM, e.g. vdsm-4.14.11.

Root Cause

Some versions VDSM were issuing a Loop Initialization Primitive (LIP) to all FibreChannel hosts, by writing 1 to /sys/class/fc_host/host*/issue_lip when certain storage-related events occur.
Such events might be;
- place host in maintenance mode
- activate host
- start/stop/restart vdsm
- activate/deactivate Export or ISO domain
- create/edit storage domain

This problem was tracked in RHBZ #1152587 - vdsm-4.14.13-2 sends FC LIP events on storage actions.

Diagnostic Steps

The engine logs show high latency on storage:

Storage domain Example_Storage experienced a high latency of 16.0312 seconds from host node1. This may cause performance and functional issues.
Storage domain Example_Storage experienced a high latency of 9.1523 seconds from host node3. This may cause performance and functional issues.

/var/log/messages on the hosts contains:

Sep 24 08:08:56 node1 kernel: qla2xxx [0000:1f:00.0]-505f:3: Link is operational (4 Gbps).
Sep 24 08:08:56 node1 kernel: qla2xxx [0000:1f:00.1]-505f:4: Link is operational (4 Gbps).
Sep 24 08:09:32 node1 kernel: qla2xxx [0000:1f:00.0]-801c:3: Abort command issued nexus=3:1:5 --  1 2002.

These events coincide with LIP sent by vdsm:

# grep -i lip /var/log/vdsm/supervdsm.log
supervdsm.log:MainProcess|Thread-14::DEBUG::2014-09-24 08:08:50,811::hba::56::Storage.HBA::(rescan) Issuing lip /sys/class/fc_host/host5/issue_lip
supervdsm.log:MainProcess|Thread-14::DEBUG::2014-09-24 08:08:50,827::hba::56::Storage.HBA::(rescan) Issuing lip /sys/class/fc_host/host6/issue_lip

On boot from SAN , root filesystem gets mounted as read-only during boot . lip event can be noticed in supervdsm.log during the same time.

Nov 28 13:28:29 host1 kernel: lpfc 0000:04:00.2: 0:1305 Link Down Event x2 received Data: x2 x20 x800110 x0 x0
Nov 28 13:28:29 host1 kernel: lpfc 0000:04:00.2: 0:1303 Link Up Event x3 received Data: x3 x0 x40 x0 x0 x0 0
Nov 28 13:28:29 host1 kernel: lpfc 0000:04:00.3: 1:1305 Link Down Event x3 received Data: x3 x20 x800110 x0 x0
Nov 28 13:28:29 host1 fcoemon: FC_HOST_EVENT 7 at 1417181309 secs on host0:code 3=link_down datalen 4 data=0
......
Nov 28 13:28:34 host1 kernel: Buffer I/O error on device dm-8, logical block 33014
Nov 28 13:28:34 host1 kernel: lost page write due to I/O error on dm-8
Nov 28 13:28:34 host1 kernel: JBD2: Detected IO errors while flushing file data on dm-8-8
Nov 28 13:28:34 host1 kernel: Aborting journal on device dm-8-8.
Nov 28 13:28:34 host1 kernel: EXT4-fs error (device dm-8) in ext4_dirty_inode: IO failure
Nov 28 13:28:34 host1 kernel: Buffer I/O error on device dm-8, logical block 32972
Nov 28 13:28:34 host1 kernel: lost page write due to I/O error on dm-8
Nov 28 13:28:34 host1 kernel: Buffer I/O error on device dm-8, logical block 32927
Nov 28 13:28:34 host1 kernel: lost page write due to I/O error on dm-8
Nov 28 13:28:34 host1 kernel: end_request: I/O error, dev dm-0, sector 136233856
Nov 28 13:28:34 host1 kernel: end_request: I/O error, dev dm-0, sector 134154376
Nov 28 13:28:34 host1 kernel: Buffer I/O error on device dm-8, logical block 262144
Nov 28 13:28:34 host1 kernel: lost page write due to I/O error on dm-8
Nov 28 13:28:34 host1 kernel: JBD2: I/O error detected when updating journal superblock for dm-8-8.
Nov 28 13:28:34 host1 kernel: end_request: I/O error, dev dm-0, sector 134136704
Nov 28 13:28:34 host1 kernel: end_request: I/O error, dev dm-0, sector 134203600
Nov 28 13:28:34 host1 kernel: end_request: I/O error, dev dm-0, sector 134137744
Nov 28 13:28:34 host1 kernel: EXT4-fs error (device dm-8): ext4_journal_start_sb: Detected aborted journal
Nov 28 13:28:34 host1 kernel: EXT4-fs (dm-8): Remounting filesystem read-only

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

FibreChannel storage and sanlock issues occur after upgrading VDSM.

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links