System crashed during SAN storage update
Issue
-
Several systems were crashed or at least lost paths to the SAN storage (HPE 3PAR) during a 3PAR firmware update
This issue was seen only for systems having RHEL 7.6 installed, the systems with 7.4 kernel were not affected. -
Below errors, call traces were logged during the crash:
sd 4:0:1:3: [sdbo] FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK sd 4:0:1:3: [sdbo] CDB: Write(10) 2a 00 00 38 11 1b 00 00 73 00 blk_update_request: I/O error, dev sdbo, sector 3674395 device-mapper: multipath: Failing path 68:16. device-mapper: multipath: Failing path 68:32. sd 4:0:1:4: rejecting I/O to offline device device-mapper: multipath: Failing path 71:112. qla2xxx [0000:0b:00.0]-8030:3: TM IOCB failed (9). qla2xxx [0000:0b:00.0]-800c:3: do_reset failed for cmd=ffff93c6953e2300. qla2xxx [0000:0b:00.0]-800f:3: DEVICE RESET FAILED: Task management failed nexus=3:0:1 cmd=ffff93c6953e2300. qla2xxx [0000:0b:00.0]-8009:3: DEVICE RESET ISSUED nexus=3:0:3 cmd=ffff93f4c63cda40. qla2xxx [0000:0b:00.0]-8030:3: TM IOCB failed (5). qla2xxx [0000:0b:00.0]-800c:3: do_reset failed for cmd=ffff93f4c63cda40. qla2xxx [0000:0b:00.0]-800f:3: DEVICE RESET FAILED: Task management failed nexus=3:0:3 cmd=ffff93f4c63cda40. qla2xxx [0000:0b:00.0]-8009:3: TARGET RESET ISSUED nexus=3:0:0 cmd=ffff93f4c67b3100. qla2xxx [0000:0b:00.0]-8030:3: TM IOCB failed (9). qla2xxx [0000:0b:00.0]-800c:3: do_reset failed for cmd=ffff93f4c67b3100. qla2xxx [0000:0b:00.0]-800f:3: TARGET RESET FAILED: Task management failed nexus=3:0:0 cmd=ffff93f4c67b3100. qla2xxx [0000:0b:00.0]-8012:3: BUS RESET ISSUED nexus=3:0:1. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffc02a974a>] qla2x00_eh_wait_on_command+0x1a/0xa0 [qla2xxx] PGD 0 Oops: 0000 [#1] SMP Modules linked in: nfsv3 nfs_acl rpcsec_gss_krb5 [...] qla2xxx nvme_fc nvme_fabrics CPU: 22 PID: 4286 Comm: scsi_eh_3 Kdump: loaded Tainted: P IOE ------------ 3.10.0-957.5.1.el7.x86_64 #1 Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015 task: ffff93fe087b4100 ti: ffff93fd4e894000 task.ti: ffff93fd4e894000 RIP: 0010:[<ffffffffc02a974a>] [<ffffffffc02a974a>] qla2x00_eh_wait_on_command+0x1a/0xa0 [qla2xxx] RSP: 0018:ffff93fd4e897ce8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff93d56e8db9c0 RCX: ffff93cddf4a1740 RDX: 0000000000000140 RSI: ffff93d56e8da680 RDI: ffff93d56e8db9c0 RBP: ffff93fd4e897cf0 R08: ffff93d56e8d9c00 R09: 0000000000004000 R10: ffff9400b1433ec0 R11: 0000000000000800 R12: 0000000000000000 R13: ffff93cddf4a1740 R14: 0000000000000000 R15: ffff93d56e8db9c0 FS: 0000000000000000(0000) GS:ffff93d6e7cc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000005931210000 CR4: 00000000000207e0 Call Trace: [<ffffffffc02ad50a>] qla2x00_eh_wait_for_pending_commands+0xda/0x140 [qla2xxx] [<ffffffffc02b5a4a>] qla2xxx_eh_bus_reset+0x17a/0x1c0 [qla2xxx] [<ffffffffa0ad33c6>] scsi_try_bus_reset+0x46/0x100 [<ffffffffa0ad52b1>] scsi_eh_ready_devs+0x771/0xc60 [<ffffffffa0ad6a8c>] scsi_error_handler+0x56c/0x8b0 [<ffffffffa0ad6520>] ? scsi_eh_get_sense+0x250/0x250 [<ffffffffa06c1c71>] kthread+0xd1/0xe0 [<ffffffffa06c1ba0>] ? insert_kthread_work+0x40/0x40 [<ffffffffa0d74c37>] ret_from_fork_nospec_begin+0x21/0x21 [<ffffffffa06c1ba0>] ? insert_kthread_work+0x40/0x40 Code: e8 2c e2 69 e0 eb 97 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 53 48 8b 07 48 89 fb 48 8b 30 48 8b 86 f0 0b 00 00 <48> 8b 10 83 ba 90 00 00 00 01 75 4a 48 8b 40 10 a9 00 00 04 00 RIP [<ffffffffc02a974a>] qla2x00_eh_wait_on_command+0x1a/0xa0 [qla2xxx] RSP <ffff93fd4e897ce8> CR2: 0000000000000000
-
Below is another crash observed during the 3PAR storage array firmware upgrade:
CPU: 0 PID: 421 Comm: scsi_eh_1 Kdump: loaded Tainted: G ------------ T 3.10.0-957.el7.debugscsi.x86_64 #1 Hardware name: HP ProLiant BL460c Gen9, BIOS I36 02/17/2017 task: ffffa0626cf2a080 ti: ffffa0626ad18000 task.ti: ffffa0626ad18000 RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffffa0626ad1bbf8 EFLAGS: 00010082 qla2xxx [0000:09:00.1]-801c:2: Abort command issued nexus=2:1:2 -- 0 2003. RAX: 0000000000000000 RBX: ffffa05f02bdc540 RCX: 000000010024000d RDX: 000000010024000e RSI: ffffe63c240af700 RDI: ffffa05f02bdda40 RBP: ffffa0626ad1bc10 R08: ffffa05f02bdc540 R09: 000000010024000d R10: 0000000002bdd801 R11: ffffe63c240af700 R12: ffffa05f02bdda40 R13: ffffa05e58fc8740 R14: ffffa05e5d84b380 R15: 00000000000005e1 FS: 0000000000000000(0000) GS:ffffa05e5fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001df610000 CR4: 00000000001607f0 Call Trace: [<ffffffffc0451da4>] ? qla2x00_sp_compl+0x54/0xb0 [qla2xxx] [<ffffffffc0451bb2>] __qla2x00_abort_all_cmds+0xc2/0x260 [qla2xxx] [<ffffffffc0455bd7>] qla2x00_abort_all_cmds+0x27/0x70 [qla2xxx] [<ffffffffc046be13>] qla2x00_abort_isp_cleanup+0x2a3/0x330 [qla2xxx] [<ffffffffc046bf9d>] qla2x00_abort_isp+0xfd/0x6d0 [qla2xxx] [<ffffffffc04557f5>] qla2xxx_eh_host_reset+0x285/0x2c0 [qla2xxx] [<ffffffffc045da1a>] ? qla2xxx_eh_bus_reset+0x14a/0x1c0 [qla2xxx] [<ffffffffbecd2e46>] scsi_try_host_reset+0x46/0x100 [<ffffffffbecd4cf6>] scsi_eh_ready_devs+0x876/0xc60 [<ffffffffbecd63cc>] scsi_error_handler+0x56c/0x8b0 [<ffffffffbecd5e60>] ? scsi_eh_get_sense+0x250/0x250 [<ffffffffbe8c1c31>] kthread+0xd1/0xe0 [<ffffffffbe8c1b60>] ? insert_kthread_work+0x40/0x40 [<ffffffffbef74c37>] ret_from_fork_nospec_begin+0x21/0x21 [<ffffffffbe8c1b60>] ? insert_kthread_work+0x40/0x40 Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffffa0626ad1bbf8> CR2: 0000000000000000
Environment
- Red Hat Enterprise Linux 7.6
kernel
version3.10.0-957.el7
and <3.10.0-957.10.1.el7
- 3PAR Storage array configured with persistent port mode
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.