SRP fails to reconnect target
Issue
- RHEL 5.8/5.9 systems with infiniband connections to a SAN device using the SRP protocal over infiniband.
- Target device fails then after the failed path is restored SRP_Daemon fails to reconnect the target device.
- The behavior has been witnessed on sporadic occasions with random systems
- The way SRP HA works when a target path fails the ib_srp module will remove the scsi_host from the OS, when the path is restored srp_daemon will detect it and request ib_srp to add the scsi_host back.
- What appears to be happening is the scsi_host are removed successfully, but on random occasions ib_srp fails to add them back.
- Attemps to manually add them are unsuccessful as well and srpd has to be restarted.
Environment
- Red Hat Enterprise Linux 5 Update 9
- kernel 2.6.18-348.el5
- Infiniband related RPM's
- infiniband-diags-1.5.12-2.el5.x86_64 Mon 21 Jan 2013 02:07:30 PM CST
- libmlx4-1.0.2-1.el5.i386 Mon 21 Jan 2013 02:05:58 PM CST
- libmlx4-1.0.2-1.el5.x86_64 Mon 21 Jan 2013 02:05:58 PM CST
- openib-1.5.4.1-4.el5.noarch Mon 21 Jan 2013 02:05:54 PM CST
- srptools-0.0.4-10.el5.x86_64 Mon 21 Jan 2013 02:06:41 PM CST
- opensm-libs-3.3.13-1.el5.x86_64 Mon 21 Jan 2013 01:09:20 PM CST
- Infiniband card details
- mlx4_core: Mellanox ConnectX core driver v1.0-ofed1.5.4 (November 10, 2011)
- mlx4_core: Initializing 0000:04:00.0
- mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.3 (Jan 2011)
- mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0-ofed1.5.4 (November 10, 2011)
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.