Request timeout issue with NVMe/TCP under high IO load

Solution In Progress - Updated -

Issue

  • Request timeout seen with NVMe/TCP under high IO load.
[  121.459789] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.20.53.135:4420
[  121.460298] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[  121.469848] nvme nvme0: creating 24 I/O queues.
[  121.708457] nvme nvme0: mapped 24/0/0 default/read/poll queues.
[  121.715034] nvme nvme0: new ctrl: NQN "nqn.2020-01.com.infinidat:20015-subsystem-19988", addr 172.XX.XX.XXX:4420
[  121.730466] nvme nvme1: creating 24 I/O queues.
[  121.968453] nvme nvme1: mapped 24/0/0 default/read/poll queues.
[  121.974964] nvme nvme1: new ctrl: NQN "nqn.2020-01.com.infinidat:20015-subsystem-19988", addr 172.XX.XX.XX:4420

[ 2609.736192] nvme nvme0: queue 22: timeout request 0x38 type 4
[ 2609.736201] nvme nvme0: starting error recovery
[ 2609.737241] nvme nvme0: Reconnecting in 10 seconds...
[ 2619.987225] nvme nvme0: creating 24 I/O queues.
[ 2620.258490] nvme nvme0: mapped 24/0/0 default/read/poll queues.
[ 2620.263160] nvme nvme0: Successfully reconnected (1 attempt)
[ 2994.751516] nvme nvme1: queue 8: timeout request 0x6 type 4
[ 2994.751525] nvme nvme1: starting error recovery
[ 2994.752560] nvme nvme1: Reconnecting in 10 seconds...
[ 3005.004597] nvme nvme1: creating 24 I/O queues.
[ 3005.274160] nvme nvme1: mapped 24/0/0 default/read/poll queues.
[ 3005.274290] debugfs: Directory 'hctx0' with parent '/' already present!
[ 3005.274298] debugfs: Directory 'hctx1' with parent '/' already present!
[ 3005.274302] debugfs: Directory 'hctx2' with parent '/' already present!
..
[ 3005.274362] debugfs: Directory 'hctx23' with parent '/' already present!
[ 3005.277796] nvme nvme1: Successfully reconnected (1 attempt)
[ 3064.893442] nvme nvme0: queue 18: timeout request 0x37 type 4
[ 3064.893451] nvme nvme0: starting error recovery
[ 3064.894448] nvme nvme0: Reconnecting in 10 seconds...
[ 3075.144327] nvme nvme0: creating 24 I/O queues.
[ 3075.402443] nvme nvme0: mapped 24/0/0 default/read/poll queues.
[ 3075.407243] nvme nvme0: Successfully reconnected (1 attempt)
[ 3005.277796] nvme nvme1: Successfully reconnected (1 attempt)
[ 3064.893442] nvme nvme0: queue 18: timeout request 0x37 type 4
[ 3064.893451] nvme nvme0: starting error recovery
[ 3064.894448] nvme nvme0: Reconnecting in 10 seconds...
[ 3075.144327] nvme nvme0: creating 24 I/O queues.
[ 3075.402443] nvme nvme0: mapped 24/0/0 default/read/poll queues.
[ 3075.407243] nvme nvme0: Successfully reconnected (1 attempt)
..
[16392.473564] nvme nvme0: starting error recovery
[16392.474660] nvme nvme0: Reconnecting in 10 seconds...
[16402.722441] nvme nvme0: creating 24 I/O queues.
[16402.987520] nvme nvme0: mapped 24/0/0 default/read/poll queues.
[16402.993473] nvme nvme0: Successfully reconnected (1 attempt)
[17737.466878] nvme nvme1: queue 23: timeout request 0x8 type 4
[17737.466886] nvme nvme1: starting error recovery
[17737.467932] nvme nvme1: Reconnecting in 10 seconds...
[17747.714022] nvme nvme1: creating 24 I/O queues.
[17747.981289] nvme nvme1: mapped 24/0/0 default/read/poll queues.
..
[44207.566992] debugfs: Directory 'hctx22' with parent '/' already present!
[44207.566995] debugfs: Directory 'hctx23' with parent '/' already present!
[44207.571479] nvme nvme1: Successfully reconnected (1 attempt)
[44910.247638] nvme nvme0: queue 13: timeout request 0x5b type 4
[44910.247646] nvme nvme0: starting error recovery
[44910.248687] nvme nvme0: Reconnecting in 10 seconds...
[44920.496424] nvme nvme0: creating 24 I/O queues.
[44920.754444] nvme nvme0: mapped 24/0/0 default/read/poll queues.
[44920.760197] nvme nvme0: Successfully reconnected (1 attempt)
[49347.144379] nvme nvme1: queue 15: timeout request 0x52 type 4
[49347.144386] nvme nvme1: starting error recovery
[49347.145416] nvme nvme1: Reconnecting in 10 seconds...
[49357.402221] nvme nvme1: creating 24 I/O queues.
[49357.671423] nvme nvme1: mapped 24/0/0 default/read/poll queues.
[49357.671527] debugfs: Directory 'hctx0' with parent '/' already present!

Environment

  • Red Hat Enterprise Linux 9
    • NVMe/TCP

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content