Red Hat Enterprise Linux 7 crashed in the rdma_cm kernel module

Environment

Red Hat Enterprise Linux (RHEL) 7
- Specifically kernel versions below and not including kernel-3.10.0-1160.46.1.el7
Infiniband/RDMA

Issue

The system crashed in the rdma_cm kernel module within the cma_comp_exch function attempting to lock spin lock

Resolution

Update the kernel to at least kernel-3.10.0-1160.46.1.el7 or above and monitor for additional kernel panics.

Root Cause

Upon exposing the RDMA Connection Manager (rdma_cm) layer to userspace via ucma, concurrent access from userspace may interact with multiple structures internally to the kernel modules without locking them down for protection. As such, access could lead to inconsistent states in the structures and ultimately a kernel panic.

The patch below wrapped all calls to the RDMA CM layer with a mutex lock to enable concurrent multi-threaded access;

 git show 2efd16d4fba4
    RDMA/ucma: Put a lock around every call to the rdma_cm layer

    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1978075
    CVE: CVE-2020-36385
[...]
        RDMA/ucma: Put a lock around every call to the rdma_cm layer

        The rdma_cm must be used single threaded.

        This appears to be a bug in the design, as it does have lots of locking
        that seems like it should allow concurrency. However, when it is all said
        and done every single place that uses the cma_exch() scheme is broken, and
        all the unlocked reads from the ucma of the cm_id data are wrong too.

        syzkaller has been finding endless bugs related to this.

        Fixing this in any elegant way is some enormous amount of work. Take a
        very big hammer and put a mutex around everything to do with the
        ucma_context at the top of every syscall.
[...]

Diagnostic Steps

If not done so already, setup kdump to generate vmcores for analysis on crashes and crash to analyse the vmcore generated from a crash.

Loading the vmcore, the backtrace can be ascertained with bt:

RIP: 0010:[<ffffffff98d17a90>]  [<ffffffff98d17a90>] native_queued_spin_lock_slowpath+0x110/0x200
Call Trace:
[<ffffffff9937dcf3>] queued_spin_lock_slowpath+0xb/0xf
[<ffffffff9938bb27>] _raw_spin_lock_irqsave+0x37/0x40
[<ffffffffc0c79218>] cma_comp_exch+0x28/0x60 [rdma_cm]   <-------------- HERE
[<ffffffffc0c7da93>] cma_work_handler+0x33/0xa0 [rdma_cm]
[<ffffffff98cbde8f>] process_one_work+0x17f/0x440
[<ffffffff98cbefa6>] worker_thread+0x126/0x3c0
[<ffffffff98cbee80>] ? manage_workers.isra.26+0x2a0/0x2a0
[<ffffffff98cc5e61>] kthread+0xd1/0xe0
[<ffffffff98cc5d90>] ? insert_kthread_work+0x40/0x40
[<ffffffff99395df7>] ret_from_fork_nospec_begin+0x21/0x21
[<ffffffff98cc5d90>] ? insert_kthread_work+0x40/0x40

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Select Your Language

Red Hat Enterprise Linux 7 crashed in the rdma_cm kernel module

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Environment

Issue

Resolution

Root Cause

Diagnostic Steps

Comments

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links