Mellanox ConnectX-4 (mlx5) used for SCSI RDMA Protocol (SRP) is exposed to lock contention in srp_queuecommand(), srp_recv_completion(), mlx5_ib_poll_cq()

Solution Verified - Updated -

Issue

  • While using Mellanox ConnectX-4 (mlx5 driver) to access SCSI RDMA Protocol (SRP) devices hard lockups have been seen during IO loading. The logged call trace output of course varies, but the most common paths documented to have seen the problem include

    • srp_recv_completion()
    • srp_queuecommand()
    • srp_send_completion()
  • The same issue has also surfaced in __list_add(), with the following in the logs

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
    IP: [<ffffffff8130c2c3>] __list_add+0x33/0xc0
    Oops: 0002 [#1] SMP
    

Environment

  • Red Hat Enterprise Linux (RHEL) 7.2
  • Red Hat Enterprise Linux (RHEL) 6.8
  • mlx5 driver

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content