Server intermittently crashing with RIP 'lpfc_sli_validate_fcp_iocb+0x74/0xc0 [lpfc]'

Solution Verified - Updated -

Issue

  • During the database restoration activity, system crashed with list corruption and General Protection Fault errors:

    lpfc 0000:06:00.2: 0:(0):0266 Issue NameServer Req x21f err 0 Data: x810114 x0
    lpfc 0000:06:00.2: 0:(0):0708 Allocation request of 32 command buffers did not succeed.  Allocated 0 buffers.
    sd 1:0:0:20: timing out command, waited 180s
    scsi_io_completion: 12 callbacks suppressed
    [...]
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 42990 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
    list_del corruption. prev->next should be ffff8bf7b6638c00, but was ffff8bf7bad6d470
    ...
    CPU: 0 PID: 42990 Comm: kworker/0:50 Kdump: loaded Tainted: P           OE  ------------ T 3.10.0-862.3.2.el7.x86_64 #1
    Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
    Workqueue: lpfc_wq lpfc_sli4_hba_process_cq [lpfc]
    Call Trace:
     [<ffffffffb090e78e>] dump_stack+0x19/0x1b
     [<ffffffffb0291998>] __warn+0xd8/0x100
     [<ffffffffb0291a1f>] warn_slowpath_fmt+0x5f/0x80
     [<ffffffffc045dbb9>] ? lpfc_sli_release_iocbq+0x49/0x60 [lpfc]
     [<ffffffffb05687f1>] __list_del_entry+0xa1/0xd0
     [<ffffffffc0459df2>] lpfc_sli_iocbq_lookup_by_tag.isra.20+0x42/0xb0 [lpfc]
     [<ffffffffc0459ed1>] lpfc_sli4_fp_handle_fcp_wcqe.isra.24+0x71/0x2f0 [lpfc]
     [<ffffffffb02d3b68>] ? __enqueue_entity+0x78/0x80
     [<ffffffffb02da62c>] ? enqueue_entity+0x26c/0xb60
     [<ffffffffb02db128>] ? enqueue_task_fair+0x208/0x6c0
     [<ffffffffc04b4eb0>] ? lpfc_calc_bg_err+0x4a0/0x4a0 [lpfc]
     [<ffffffffc045ab52>] lpfc_sli4_fp_handle_cqe+0x242/0x4b0 [lpfc]
     [<ffffffffb022959e>] ? __switch_to+0xce/0x580
     [<ffffffffc045c359>] lpfc_sli4_hba_process_cq+0x99/0x1a0 [lpfc]
     [<ffffffffb02b312f>] process_one_work+0x17f/0x440
     [<ffffffffb02b3df6>] worker_thread+0x126/0x3c0
     [<ffffffffb02b3cd0>] ? manage_workers.isra.24+0x2a0/0x2a0
     [<ffffffffb02bb161>] kthread+0xd1/0xe0
     [<ffffffffb02bb090>] ? insert_kthread_work+0x40/0x40
     [<ffffffffb0920677>] ret_from_fork_nospec_begin+0x21/0x21
     [<ffffffffb02bb090>] ? insert_kthread_work+0x40/0x40
    ---[ end trace 2e00367530814c1d ]---
    lpfc 0000:06:00.2: 0:0372 iotag xc1c lookup error: max iotag (xf38) iocb_flag x4
    [...]
    general protection fault: 0000 [#1] SMP 
    [...]
    scsi_tgt scsi_transport_sas crct10dif_common dm_mirror dm_region_hash dm_log dm_mod [last unloaded: oracleasm]
    CPU: 8 PID: 892 Comm: scsi_eh_2 Kdump: loaded Tainted: P        W  OE  ------------ T 3.10.0-862.3.2.el7.x86_64 #1
    Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
    task: ffff8bf7b61b0fd0 ti: ffff8be935f78000 task.ti: ffff8be935f78000
    RIP: 0010:[<ffffffffc045a6d4>]  [<ffffffffc045a6d4>] lpfc_sli_validate_fcp_iocb+0x74/0xc0 [lpfc]
    RSP: 0018:ffff8be935f7bc68  EFLAGS: 00010097
    RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000014
    RDX: 0000000000000001 RSI: 6572662f67726f2f RDI: ffff8bbe99ec0470
    RBP: ffff8be935f7bc70 R08: 0000000000000000 R09: ffff8bf7b7a9ef00
    R10: 000000000000189a R11: 0000000000000001 R12: ffff8bf7b69d3740
    R13: ffff8bf7b6030000 R14: 0000000000000000 R15: 0000000000000f4f
    FS:  0000000000000000(0000) GS:ffff8bf7bf600000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fe0f5002000 CR3: 00000030f600e000 CR4: 00000000003607e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     [<ffffffffc04642c8>] lpfc_sli_sum_iocb+0x78/0xc0 [lpfc]
     [<ffffffffc04ae86e>] lpfc_reset_flush_io_context+0x2e/0x190 [lpfc]
     [<ffffffffc04afedc>] lpfc_device_reset_handler+0x16c/0x220 [lpfc]
     [<ffffffffb069d30d>] scsi_try_bus_device_reset+0x2d/0x60
     [<ffffffffb069f50f>] scsi_eh_ready_devs+0x4ef/0xc60
     [<ffffffffb06a0f6c>] scsi_error_handler+0x56c/0x8b0
     [<ffffffffb06a0a00>] ? scsi_eh_get_sense+0x250/0x250
     [<ffffffffb02bb161>] kthread+0xd1/0xe0
     [<ffffffffb02bb090>] ? insert_kthread_work+0x40/0x40
     [<ffffffffb0920677>] ret_from_fork_nospec_begin+0x21/0x21
     [<ffffffffb02bb090>] ? insert_kthread_work+0x40/0x40
    [1896190.457188] Code: c2 48 c7 c6 30 21 4e c0 48 c7 c7 f8 84 4e c0 31 c0 e8 08 e5 4a f0 b8 01 00 00 00 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 77 a8 <48> 8b 36 48 85 f6 74 a8 66 3b 56 38 75 a2 48 8b 7f e0 48 89 4d 
    [1896190.459160] RIP  [<ffffffffc045a6d4>] lpfc_sli_validate_fcp_iocb+0x74/0xc0 [lpfc]
    [1896190.460140]  RSP <ffff8be935f7bc68>
    

Environment

  • Red Hat Enterprise Linux 7.5
    • kernel-3.10.0-862.3.2.el7
  • Emulex FC/FCoE HBAs (lpfc driver)
  • Oracle DB

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content