After the nvme disk is unplugged and plugged back for the first time, the nvme driver does not load the nvme device normally
Issue
- While testing the hot-plug function of nvme device, it was found that after the nvme disk is unplugged and plugged back for the first time, the nvme driver does not load the nvme device normally. When the nvme device is repeatedly unplugged and plugged back for the second, third or later times, nvme disk is identified. [ The test was done after rebooting the server.]
1. System boot
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 7T 0 disk
# lspci
b6:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a] (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd General DC NVMe PM9A3 [144d:a813]
Kernel driver in use: nvme
Kernel modules: nvme
00: 4d 14 0a a8 06 04 10 00 00 02 08 01 10 00 00 00
10: 04 00 01 e1 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 4d 14 13 a8
30: 00 00 00 e1 40 00 00 00 00 00 00 00 ff 01 00 00
2. Unplug nvme for the first time.
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
# lspci
b6:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a] (rev ff) (prog-if ff)
!!! Unknown header type 7f
Kernel driver in use: nvme
Kernel modules: nvme
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
# dmesg -w
Jan 27 19:47:14 localhost kernel: pcieport 0000:b2:03.0: Slot(7): Link Down
Jan 27 19:47:14 localhost kernel: pcieport 0000:b2:03.0: Slot(7): Card not present
Jan 27 19:47:14 localhost kernel: pcieport 0000:b2:03.0: Slot(7): Already disabled
...
Jan 27 19:48:41 localhost kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Jan 27 19:48:41 localhost kernel: nvme 0000:b6:00.0: can't change power state from D3hot to D0 (config space inaccessible)
Jan 27 19:48:41 localhost kernel: nvme nvme0: Removing after probe failure status: -19
Jan 27 19:48:41 localhost kernel: blk_update_request: I/O error, dev nvme0n1, sector 0 op 0x0:(READ) flags 0x0 phys_seg 11 prio class 0
Jan 27 19:48:41 localhost kernel: blk_update_request: I/O error, dev nvme0n1, sector 15002931712 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 27 19:48:41 localhost kernel: blk_update_request: I/O error, dev nvme0n1, sector 0 op 0x0:(READ) flags 0x0 phys_seg 3 prio class 0
3. plug back nvme the first time .
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
# lspci
b6:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a] (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd General DC NVMe PM9A3 [144d:a813]
Kernel modules: nvme
00: 4d 14 0a a8 00 00 10 00 00 02 08 01 00 00 00 00
10: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 4d 14 13 a8
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
# dmesg -w
Jan 27 19:52:14 localhost kernel: pcieport 0000:b2:03.0: Slot(7): Card present
Jan 27 19:52:14 localhost kernel: pcieport 0000:b2:03.0: Slot(7): Link Up
Also tried: “modprobe -r nvme” and “modprobe nvme” but it did not help.
4. Unplug nvme for the second time:
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
# lspci
# dmesg -w
Jan 27 19:56:01 localhost kernel: pcieport 0000:b2:03.0: Slot(7): Link Down
Jan 27 19:56:01 localhost kernel: pcieport 0000:b2:03.0: Slot(7): Card not present
5. plug back nvme the second time:
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 7T 0 disk
# lspci
b6:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a] (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd General DC NVMe PM9A3 [144d:a813]
Kernel driver in use: nvme
Kernel modules: nvme
00: 4d 14 0a a8 06 04 10 00 00 02 08 01 00 00 00 00
10: 04 00 01 e1 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 4d 14 13 a8
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
# dmesg -w
Jan 27 19:59:01 localhost kernel: pcieport 0000:b2:03.0: Slot(7): Card present
Jan 27 19:59:01 localhost kernel: pcieport 0000:b2:03.0: Slot(7): Link Up
Jan 27 19:59:01 localhost kernel: pci 0000:b6:00.0: [144d:a80a] type 00 class 0x010802
Jan 27 19:59:01 localhost kernel: pci 0000:b6:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
Jan 27 19:59:01 localhost kernel: pci 0000:b6:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref]
Jan 27 19:59:01 localhost kernel: pci 0000:b6:00.0: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:b2:03.0 (capable of 63.012 Gb/s with 16.0 GT/s PCIe x4 link)
Jan 27 19:59:01 localhost kernel: pci 0000:b6:00.0: BAR 6: assigned [mem 0xe1000000-0xe100ffff pref]
Jan 27 19:59:01 localhost kernel: pci 0000:b6:00.0: BAR 0: assigned [mem 0xe1010000-0xe1013fff 64bit]
Jan 27 19:59:01 localhost kernel: pcieport 0000:b2:03.0: PCI bridge to [bus b6]
Jan 27 19:59:01 localhost kernel: pcieport 0000:b2:03.0: bridge window [io 0x7000-0x7fff]
Jan 27 19:59:01 localhost kernel: pcieport 0000:b2:03.0: bridge window [mem 0xe1000000-0xe10fffff]
Jan 27 19:59:01 localhost kernel: pcieport 0000:b2:03.0: bridge window [mem 0x38c000200000-0x38c0003fffff 64bit pref]
Jan 27 19:59:01 localhost kernel: nvme nvme0: pci function 0000:b6:00.0
Jan 27 19:59:01 localhost kernel: nvme 0000:b6:00.0: enabling device (0000 -> 0002)
Jan 27 19:59:01 localhost fwupd[3540]: 11:59:01:0723 FuEngine failed to add device /sys/devices/pci0000:b2/0000:b2:03.0/0000:b6:00.0/nvme/nvme0: failed to open /dev/nvme0: Resource temporarily unavailable
Jan 27 19:59:13 localhost kernel: nvme nvme0: Shutdown timeout set to 16 seconds
Jan 27 19:59:13 localhost kernel: nvme nvme0: 24/0/0 default/read/poll queues
Jan 27 19:59:13 localhost kernel: nvme0n1: detected capacity change from 0 to 7681501126656
Environment
- Red Hat Enterprise Linux (RHEL) 8
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.