System crash in qla2x00_status_entry() or blk_{requeue,start,finish}_request() after qla2xxx connection issues or aborts

Solution Unverified - Updated -

Issue

  • System crash in qla2x00_status_entry() or blk_{requeue,start,finish}_request() after qla2xxx connection issues or aborts.

Example 1

[   67.904651] qla2xxx [0000:19:00.0]-801c:2: Abort command issued nexus=2:1:0 -- 2002.
....
[  341.677920] BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
[  341.678018] IP: [<ffffffffc06b9fe7>] qla2x00_status_entry+0x4f7/0x1900 [qla2xxx]
[  341.678136] PGD 0 
[  341.678166] Oops: 0000 [#1] SMP 
....
[  341.678667] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: P           OE  ------------   3.10.0-1160.2.2.el7.x86_64 #1
[  341.678698] Hardware name: Cisco Systems Inc UCSC-C240-M5L/UCSC-C240-M5L, BIOS C240M5.4.1.3e.0.1210201753 12/10/2020
[  341.678727] task: ffffffffa8618480 ti: ffffffffa8600000 task.ti: ffffffffa8600000
[  341.678749] RIP: 0010:[<ffffffffc06b9fe7>]  [<ffffffffc06b9fe7>] qla2x00_status_entry+0x4f7/0x1900 [qla2xxx]
[  341.679691] RSP: 0018:ffff8c4bbee03cb0  EFLAGS: 00010093
[  341.680564] RAX: 0000000000000000 RBX: 00000000000e0000 RCX: ffffffffc071a368
[  341.681414] RDX: 0000000000000017 RSI: ffff8c4cae35d740 RDI: 0000000008000000
[  341.682246] RBP: ffff8c4bbee03dd0 R08: 0000000000000029 R09: 0000000000000000
[  341.683060] R10: ffff8c2dbdfb0808 R11: ffff8c2dbdfb0680 R12: 0000000000000000
[  341.683872] R13: 0000000000000000 R14: ffff8c4cae35d740 R15: 0000000000000029
[  341.684682] FS:  0000000000000000(0000) GS:ffff8c4bbee00000(0000) knlGS:0000000000000000
[  341.685499] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  341.686316] CR2: 0000000000000084 CR3: 0000001f51a10000 CR4: 00000000007607f0
[  341.687138] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  341.687946] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  341.688736] PKRU: 00000000
[  341.689522] Call Trace:
[  341.690306]  <IRQ> 
[  341.690318]  [<ffffffffa7ad7229>] ? ttwu_do_wakeup+0x19/0xe0
[  341.691887]  [<ffffffffc06bc2f6>] qla24xx_process_response_queue+0x4b6/0x8d0 [qla2xxx]
[  341.692691]  [<ffffffffa7ad3101>] ? __wake_up_common_lock+0x91/0xc0
[  341.693475]  [<ffffffffa7ad2ba0>] ? task_rq_unlock+0x20/0x20
[  341.694264]  [<ffffffffc06be96b>] qla24xx_msix_rsp_q+0x4b/0xc0 [qla2xxx]
[  341.695046]  [<ffffffffa7b4fe54>] __handle_irq_event_percpu+0x44/0x1c0
[  341.695887]  [<ffffffffa7b50002>] handle_irq_event_percpu+0x32/0x80
[  341.696696]  [<ffffffffa7b5008c>] handle_irq_event+0x3c/0x60
[  341.697444]  [<ffffffffa7b52e7f>] handle_edge_irq+0x7f/0x150
[  341.698187]  [<ffffffffa7a2f5f4>] handle_irq+0xe4/0x1a0
[  341.698939]  [<ffffffffa7b10e4c>] ? tick_check_idle+0x8c/0xd0
[  341.699674]  [<ffffffffa819892d>] do_IRQ+0x4d/0xf0
....
[  341.711037] RIP  [<ffffffffc06b9fe7>] qla2x00_status_entry+0x4f7/0x1900 [qla2xxx]

Example 2

[11559.256216] qla2xxx [0000:81:00.0]-801c:13: Abort command issued nexus=13:0:3 -- 2002.
[11559.256541] qla2xxx [0000:81:00.0]-801c:13: Abort command issued nexus=13:0:3 -- 2002.
....
[11559.257229] sd 13:0:0:3: [sdw] tag#7 FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK cmd_age=20s
[11559.257230] sd 13:0:0:3: [sdw] tag#7 CDB: Test Unit Ready 00 00 00 00 00 00
[11559.257237] ------------[ cut here ]------------
[11559.257258] kernel BUG at block/blk-core.c:1559!
[11559.257272] invalid opcode: 0000 [#1] SMP 
....
[11559.257527] CPU: 4 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G        W      ------------   3.10.0-1160.11.1.el7.x86_64 #1
[11559.257556] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.10.5 07/25/2019
[11559.257587] task: ffff8e83213fa100 ti: ffff8e83217ec000 task.ti: ffff8e83217ec000
[11559.257605] RIP: 0010:[<ffffffff96753190>]  [<ffffffff96753190>] blk_requeue_request+0x90/0xa0
[11559.257631] RSP: 0018:ffff8e8e4fa83e58  EFLAGS: 00010087
[11559.257645] RAX: ffff8e83313dc5d0 RBX: ffff8e834e39a700 RCX: dead000000000200
[11559.257663] RDX: ffff8e83313dc5d0 RSI: ffff8e83313dc480 RDI: ffff8e83313dc5d0
[11559.257682] RBP: ffff8e8e4fa83e70 R08: ffff8e83313dc5d0 R09: 0000000000000018
[11559.257700] R10: 0000000000018297 R11: 7fffffffffffffff R12: ffff8e83313dc480
[11559.257717] R13: ffff8e8ec4b989c0 R14: 0000000000000246 R15: 0000000000001057
[11559.257736] FS:  0000000000000000(0000) GS:ffff8e8e4fa80000(0000) knlGS:0000000000000000
[11559.257756] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11559.257771] CR2: 00007fb2e8008000 CR3: 0000001031810000 CR4: 00000000001607e0
[11559.257789] Call Trace:
[11559.257798]  <IRQ> 
[11559.257808]  [<ffffffff968ecc46>] __scsi_queue_insert+0xb6/0x100
[11559.257826]  [<ffffffff968ecd8a>] scsi_softirq_done+0xda/0x160
[11559.258554]  [<ffffffff9675d426>] blk_done_softirq+0x96/0xc0
[11559.259238]  [<ffffffff964a4b95>] __do_softirq+0xf5/0x280
[11559.259910]  [<ffffffff96b974ec>] call_softirq+0x1c/0x30
[11559.260574]  [<ffffffff9642f715>] do_softirq+0x65/0xa0
....
[11559.268980] RIP  [<ffffffff96753190>] blk_requeue_request+0x90/0xa0
[11559.269620]  RSP <ffff8e8e4fa83e58>

Example 3

[7698525.714021] qla2xxx [0000:05:00.1]-801c:14: Abort command issued nexus=14:0:1 -- 2002.
[7698525.714039] qla2xxx [0000:05:00.1]-801c:14: Abort command issued nexus=14:0:26 -- 2002.
....
[7698525.714763] sd 14:0:0:26: [sdbb] tag#30 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
[7698525.714764] sd 14:0:0:26: [sdbb] tag#30 CDB: Write(16) 8a 00 00 00 00 00 89 02 99 30 00 00 01 00 00 00
[7698525.714765] blk_update_request: I/O error, dev sdbb, sector 2298648880
[7698525.721041] ------------[ cut here ]------------
[7698525.721422] kernel BUG at block/blk-core.c:2781!
[7698525.721722] invalid opcode: 0000 [#1] SMP 
....
[7698525.725573] CPU: 11 PID: 15316 Comm: kworker/11:1H Kdump: loaded Tainted: P        W  OE  ------------   3.10.0-1160.15.2.el7.x86_64 #1
[7698525.726534] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.7.1 001/22/2018
[7698525.727048] Workqueue: kblockd scsi_requeue_run_queue
[7698525.727562] task: ffff9169bc74b180 ti: ffff916a33ba0000 task.ti: ffff916a33ba0000
[7698525.728096] RIP: 0010:[<ffffffffb8b56b75>]  [<ffffffffb8b56b75>] blk_start_request+0x45/0x50
....
[7698525.734001] Call Trace:
[7698525.734618]  [<ffffffffb8b5792e>] blk_queue_start_tag+0x11e/0x1d0
[7698525.735302]  [<ffffffffb8cecf42>] ? __scsi_queue_insert+0xd2/0x100
[7698525.735944]  [<ffffffffb8ced32b>] scsi_request_fn+0x23b/0x680
[7698525.736591]  [<ffffffffb8b53869>] __blk_run_queue+0x39/0x50
[7698525.737243]  [<ffffffffb8b538e6>] blk_run_queue+0x26/0x40
[7698525.737913]  [<ffffffffb8cebaf8>] scsi_run_queue+0x258/0x2f0
[7698525.738587]  [<ffffffffb8ced785>] scsi_requeue_run_queue+0x15/0x20
[7698525.739256]  [<ffffffffb88bde3f>] process_one_work+0x17f/0x440
[7698525.739934]  [<ffffffffb88bef56>] worker_thread+0x126/0x3c0
....

Environment

  • Red Hat Enterprise Linux 7
  • kernel-3.10.0-1127.el7 or newer

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In