SRP fails to reconnect target

Solution Unverified - Updated -

Issue

  • RHEL 5.8/5.9 systems with infiniband connections to a SAN device using the SRP protocal over infiniband.
  • Target device fails then after the failed path is restored SRP_Daemon fails to reconnect the target device.
  • The behavior has been witnessed on sporadic occasions with random systems
  • The way SRP HA works when a target path fails the ib_srp module will remove the scsi_host from the OS, when the path is restored srp_daemon will detect it and request ib_srp to add the scsi_host back.
  • What appears to be happening is the scsi_host are removed successfully, but on random occasions ib_srp fails to add them back.
  • Attemps to manually add them are unsuccessful as well and srpd has to be restarted.

Environment

  • Red Hat Enterprise Linux 5 Update 9
  • kernel 2.6.18-348.el5
  • Infiniband related RPM's
    • infiniband-diags-1.5.12-2.el5.x86_64 Mon 21 Jan 2013 02:07:30 PM CST
    • libmlx4-1.0.2-1.el5.i386 Mon 21 Jan 2013 02:05:58 PM CST
    • libmlx4-1.0.2-1.el5.x86_64 Mon 21 Jan 2013 02:05:58 PM CST
    • openib-1.5.4.1-4.el5.noarch Mon 21 Jan 2013 02:05:54 PM CST
    • srptools-0.0.4-10.el5.x86_64 Mon 21 Jan 2013 02:06:41 PM CST
    • opensm-libs-3.3.13-1.el5.x86_64 Mon 21 Jan 2013 01:09:20 PM CST
  • Infiniband card details
    • mlx4_core: Mellanox ConnectX core driver v1.0-ofed1.5.4 (November 10, 2011)
    • mlx4_core: Initializing 0000:04:00.0
    • mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.3 (Jan 2011)
    • mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0-ofed1.5.4 (November 10, 2011)

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content