Mellanox ConnectX-4 (mlx5) used for SCSI RDMA Protocol (SRP) is exposed to lock contention in srp_queuecommand(), srp_recv_completion(), mlx5_ib_poll_cq()

Solution Verified - Updated -

Issue

  • While using Mellanox ConnectX-4 (mlx5 driver) to access SCSI RDMA Protocol (SRP) devices hard lockups have been seen during IO loading. The logged call trace output of course varies, but the most common paths documented to have seen the problem include

    • srp_recv_completion()
    • srp_queuecommand()
    • srp_send_completion()
  • The same issue has also surfaced in __list_add(), with the following in the logs

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
    IP: [<ffffffff8130c2c3>] __list_add+0x33/0xc0
    Oops: 0002 [#1] SMP
    

Environment

  • Red Hat Enterprise Linux (RHEL) 7.2
  • Red Hat Enterprise Linux (RHEL) 6.8
  • mlx5 driver

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase of over 48,000 articles and solutions.

Current Customers and Partners

Log in for full access

Log In
Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.