rhel6: softlockup when doing I/O to NVMe with multiple processes

Issue

The system freezes with a high disk I/O workload using the NVMe card HGST SN150. The RHEL system is installed on a separate SAS hard disk. This behavior is reproducible on multiple processes reading NVMe card (no writing).

Under heavy disk IO workload, suddenly all the disks, including the system disk, become inaccessible after the message:

>      kernel:BUG: soft lockup - CPU#0 stuck for 67s! [t_gen2:20075]

The kernel is still alive but every attempt to access a disk freezes the corresponding processes. The system becomes inaccessible (new ssh connections fail) and must be restarted through a hardware reset. No messages in the kernel logs (the system disk is frozen)

When running 14 instances of

dd if=/dev/nvme0n1 iflag=direct of=/dev/null count=1G

we see a softlockup here. Running 12 instances, the system is ok. Running 14 instances with explicit numa binding, we also do not see an issue:

numactl --membind=1 --cpunodebind=1 dd ...

The command causing the softlockup seems to be: "kblockd/18"

Environment

Red Hat Enterprise Linux (RHEL) 6, minor release <7
NVMe

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Select Your Language

rhel6: softlockup when doing I/O to NVMe with multiple processes

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

About

Red Hat legal and privacy links

Red Hat legal and privacy links

Issue

Environment

Subscriber exclusive content

Current Customers and Partners

New to Red Hat?

Using a Red Hat product through a public cloud?

Quick Links

Help

Site Info

Related Sites

Systems Status

About

Red Hat legal and privacy links

Red Hat legal and privacy links