Why Volume Groups on FC SAN attached devices not getting detected on boot ?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5
  • Boot from SAN

Issue

  • IBM blade servers are not detecting the fiber channel devices on boot and thus not activating the volume group and mounting file systems. It does detect devices and activate VGs sometimes after reboot. If "_netdev" is added in /etc/fstab file, the devices were getting detected at boot.

  • Solution mentioned in the article https://access.redhat.com/knowledge/solutions/41885 didn't helped.

    Snippet of OS log shows that it took 5 seconds before the LOOP UP is recieved for one of the HBA.

Jul  8 04:02:10 localhost syslogd 1.4.1: restart.
Jul  8 04:02:13 localhost rhsmd: This system is registered to RHN Classic
Jul  8 04:02:14 localhost kernel: qla2xxx 0000:11:00.0: LOOP UP detected (8 Gbps).
Jul  8 04:02:14 localhost kernel: qla2xxx 0000:11:00.0: ERROR -- Unable to get host loop ID.
Jul  8 04:02:14 localhost kernel: qla2xxx 0000:11:00.0: Performing ISP error recovery - ha= ffff81107b4904f8.
Jul  8 04:02:25 localhost kernel: qla2xxx 0000:11:00.0: LOOP UP detected (8 Gbps).
[... ]

Jul 10 13:21:21 localhost kernel: scsi(1): Asynchronous P2P MODE received.
Jul 10 13:21:21 localhost kernel: scsi(1): Asynchronous LOOP UP (8 Gbps).
Jul 10 13:21:21 localhost kernel: qla2xxx 0000:11:00.0: LOOP UP detected (8 Gbps).

Conversely the other HBA connected with a LOOP UP right away:

Jul 10 13:21:16 localhost kernel: qla2xxx 0000:11:00.1: Verifying loaded RISC code...
Jul 10 13:21:16 localhost kernel: scsi(2): **** Load RISC code ****
Jul 10 13:21:16 localhost kernel: scsi(2): Verifying Checksum of loaded RISC code.
Jul 10 13:21:16 localhost kernel: scsi(2): Checksum OK, start firmware.
Jul 10 13:21:16 localhost kernel: qla2xxx 0000:11:00.1: Allocated (64 KB) for EFT...
Jul 10 13:21:16 localhost kernel: qla2xxx 0000:11:00.1: Allocated (1414 KB) for firmware dump...
Jul 10 13:21:16 localhost kernel: scsi(2): Issue init firmware.
[... ]

Jul 10 13:21:16 localhost kernel: scsi(2): Asynchronous P2P MODE received.
Jul 10 13:21:16 localhost kernel: scsi(2): Asynchronous LOOP UP (8 Gbps).
Jul 10 13:21:16 localhost kernel: qla2xxx 0000:11:00.1: LOOP UP detected (8 Gbps).

Resolution

On the IBM blades, locking HBA and Switch ports to 8GB speed resolved the issue.

Root Cause

The auto-negotiation was taking too long to detect LOOP UP for one of the HBA and hence the delay in detecting some of the LUNs.

Reference:
http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-5087103

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments