System panics with "Kernel panic - not syncing: (link != head)".

Solution In Progress - Updated -

Environment

  • Red Hat Enterprise Linux.
  • Third-Party Module [oracleafd].

Issue

  • Kernel panicked with the following call traces.
[10873483.068796] F 15193744.810/240122104420 oracle_47126_op[47126] oracleafd:12:0648:Process registrations exhausted 
[10873483.068801] [Oracle ASMFD] ASSERTION FAILURE: (link != head) File: /scratch/builds/aime/aime_usm_296225/el7u5_x86_64/usm/src/afd/driver/./afduh.c Line: 386 
[10873483.068803] Kernel panic - not syncing: (link != head)
[10873483.069023] CPU: 18 PID: 47126 Comm: oracle_47126_op Kdump: loaded Tainted: P           OE  ------------   3.10.0-1160.81.1.el7.case03446837.1.x86_64 #1
[10873483.069637] Hardware name: HPE ProLiant DL580 Gen10/ProLiant DL580 Gen10, BIOS U34 07/20/2023
[10873483.070025] Call Trace:
[10873483.070148]  [<ffffffffac1b1bec>] dump_stack+0x19/0x1f
[10873483.070374]  [<ffffffffac1ab708>] panic+0xe8/0x21f
[10873483.070597]  [<ffffffffc0a4d0f0>] AfdgDoAssertion+0x60/0x60 [oracleafd]
[10873483.070897]  [<ffffffffc0a4f3cd>] afduhRemove+0xdd/0x110 [oracleafd]
[10873483.071193]  [<ffffffffc0a501bd>] afdt_remove_pidhn+0x2d/0x70 [oracleafd]
[10873483.071504]  [<ffffffffc0a60b72>] afdr_portal_close+0x192/0x2d0 [oracleafd]
[10873483.071820]  [<ffffffffc0a6a6b8>] afd_close+0x18/0x20 [oracleafd]
[10873483.082447]  [<ffffffffabc9ec4f>] __blkdev_put+0x17f/0x1b0
[10873483.093131]  [<ffffffffabc9f5fc>] blkdev_put+0x4c/0x140
[10873483.103555]  [<ffffffffabc9f7c5>] blkdev_close+0x25/0x30
[10873483.113837]  [<ffffffffabc5db4c>] __fput+0xec/0x230
[10873483.124668]  [<ffffffffabc5dd7e>] ____fput+0xe/0x20
[10873483.134881]  [<ffffffffabac7e1b>] task_work_run+0xbb/0xe0
[10873483.144969]  [<ffffffffabaa6124>] do_exit+0x2d4/0xa30
[10873483.155097]  [<ffffffffabab5c51>] ? __set_task_blocked+0x41/0xa0
[10873483.165025]  [<ffffffffabab8712>] ? __set_current_blocked+0x42/0x70
[10873483.174985]  [<ffffffffabaa68ff>] do_group_exit+0x3f/0xa0
[10873483.184688]  [<ffffffffabaa6974>] SyS_exit_group+0x14/0x20
[10873483.194536]  [<ffffffffac1c539a>] system_call_fastpath+0x25/0x2a

Resolution

Root Cause

  • The system crashed in the function of the third-party kernel module [oracleafd].

Diagnostic Steps

  • The kernel ring buffer shows call traces of the panic task which indicates a panic when operating in the code section of the third-party module [oracleafd].

Kernel ring buffer:

crash> log
[10873483.068796] F 15193744.810/240122104420 oracle_47126_op[47126] oracleafd:12:0648:Process registrations exhausted 
[10873483.068801] [Oracle ASMFD] ASSERTION FAILURE: (link != head) File: /scratch/builds/aime/aime_usm_296225/el7u5_x86_64/usm/src/afd/driver/./afduh.c Line: 386 
[10873483.068803] Kernel panic - not syncing: (link != head)
[10873483.069023] CPU: 18 PID: 47126 Comm: oracle_47126_op Kdump: loaded Tainted: P           OE  ------------   3.10.0-1160.81.1.el7.case03446837.1.x86_64 #1
[10873483.069637] Hardware name: HPE ProLiant DL580 Gen10/ProLiant DL580 Gen10, BIOS U34 07/20/2023
[10873483.070025] Call Trace:
[10873483.070148]  [<ffffffffac1b1bec>] dump_stack+0x19/0x1f
[10873483.070374]  [<ffffffffac1ab708>] panic+0xe8/0x21f
[10873483.070597]  [<ffffffffc0a4d0f0>] AfdgDoAssertion+0x60/0x60 [oracleafd]
[10873483.070897]  [<ffffffffc0a4f3cd>] afduhRemove+0xdd/0x110 [oracleafd]
[10873483.071193]  [<ffffffffc0a501bd>] afdt_remove_pidhn+0x2d/0x70 [oracleafd]
[10873483.071504]  [<ffffffffc0a60b72>] afdr_portal_close+0x192/0x2d0 [oracleafd]
[10873483.071820]  [<ffffffffc0a6a6b8>] afd_close+0x18/0x20 [oracleafd]
[10873483.082447]  [<ffffffffabc9ec4f>] __blkdev_put+0x17f/0x1b0
[10873483.093131]  [<ffffffffabc9f5fc>] blkdev_put+0x4c/0x140
[10873483.103555]  [<ffffffffabc9f7c5>] blkdev_close+0x25/0x30
[10873483.113837]  [<ffffffffabc5db4c>] __fput+0xec/0x230
[10873483.124668]  [<ffffffffabc5dd7e>] ____fput+0xe/0x20
[10873483.134881]  [<ffffffffabac7e1b>] task_work_run+0xbb/0xe0
[10873483.144969]  [<ffffffffabaa6124>] do_exit+0x2d4/0xa30
[10873483.155097]  [<ffffffffabab5c51>] ? __set_task_blocked+0x41/0xa0
[10873483.165025]  [<ffffffffabab8712>] ? __set_current_blocked+0x42/0x70
[10873483.174985]  [<ffffffffabaa68ff>] do_group_exit+0x3f/0xa0
[10873483.184688]  [<ffffffffabaa6974>] SyS_exit_group+0x14/0x20
[10873483.194536]  [<ffffffffac1c539a>] system_call_fastpath+0x25/0x2a
  • Backtrace of the panic task:
crash> bt
PID: 47126    TASK: ffff8a869a3c5280  CPU: 18   COMMAND: "oracle_47126_op"
 #0 [ffff8b069e66fb00] machine_kexec at ffffffffaba69504
 #1 [ffff8b069e66fb60] __crash_kexec at ffffffffabb29d32
 #2 [ffff8b069e66fc30] panic at ffffffffac1ab713
 #3 [ffff8b069e66fcd0] afduhRemove at ffffffffc0a4f3cd [oracleafd]    <---------
 #4 [ffff8b069e66fd08] afdt_remove_pidhn at ffffffffc0a501bd [oracleafd]
 #5 [ffff8b069e66fd28] afdr_portal_close at ffffffffc0a60b72 [oracleafd]
 #6 [ffff8b069e66fd58] afd_close at ffffffffc0a6a6b8 [oracleafd]
 #7 [ffff8b069e66fd68] __blkdev_put at ffffffffabc9ec4f
 #8 [ffff8b069e66fda8] blkdev_put at ffffffffabc9f5fc
 #9 [ffff8b069e66fdd0] blkdev_close at ffffffffabc9f7c5
#10 [ffff8b069e66fde0] __fput at ffffffffabc5db4c
#11 [ffff8b069e66fe28] ____fput at ffffffffabc5dd7e
#12 [ffff8b069e66fe38] task_work_run at ffffffffabac7e1b
#13 [ffff8b069e66fe78] do_exit at ffffffffabaa6124
#14 [ffff8b069e66ff10] do_group_exit at ffffffffabaa68ff
#15 [ffff8b069e66ff40] sys_exit_group at ffffffffabaa6974
#16 [ffff8b069e66ff50] system_call_fastpath at ffffffffac1c539a
    RIP: 00007fd9c9831da9  RSP: 00007fff2ee34488  RFLAGS: 00000246
    RAX: 00000000000000e7  RBX: 0000000000000000  RCX: ffffffffffffffff
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: 00007fff2ee40b50   R8: 000000000000003c   R9: 00000000000000e7
    R10: fffffffffffae250  R11: 0000000000000206  R12: 00007fd9ce0b5440
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 00000000000000e7  CS: 0033  SS: 002b
crash> 

Dis-assembly of address:

crash> dis -rl ffffffffc0a60b72 | tail -5
0xffffffffc0a60b63 <afdr_portal_close+387>: xor    %esi,%esi
0xffffffffc0a60b65 <afdr_portal_close+389>: call   0xffffffffc0a5f210 <afdp_assist_death>
0xffffffffc0a60b6a <afdr_portal_close+394>: mov    %rbx,%rdi
0xffffffffc0a60b6d <afdr_portal_close+397>: call   0xffffffffc0a50190 <afdt_remove_pidhn>
0xffffffffc0a60b72 <afdr_portal_close+402>: jmp    0xffffffffc0a60a99 <afdr_portal_close+185>
crash>


crash> dis -rl ffffffffc0a501bd | tail -5
0xffffffffc0a501b0 <afdt_remove_pidhn+32>:  je     0xffffffffc0a501e0 <afdt_remove_pidhn+80>
0xffffffffc0a501b2 <afdt_remove_pidhn+34>:  mov    %rbx,%rsi
0xffffffffc0a501b5 <afdt_remove_pidhn+37>:  mov    %r12,%rdi
0xffffffffc0a501b8 <afdt_remove_pidhn+40>:  call   0xffffffffc0a4f2f0 <afduhRemove>
0xffffffffc0a501bd <afdt_remove_pidhn+45>:  mov    %rbx,%rdi
crash>


crash> dis -rl ffffffffc0a4f3cd | tail -5
0xffffffffc0a4f3b5 <afduhRemove+197>:   mov    $0xffffffffc0a74658,%rdx
0xffffffffc0a4f3bc <afduhRemove+204>:   mov    $0x1,%esi
0xffffffffc0a4f3c1 <afduhRemove+209>:   mov    $0xffffffffc0a725b1,%rdi
0xffffffffc0a4f3c8 <afduhRemove+216>:   call   0xffffffffc0a4d090 <AfdgDoAssertion>
0xffffffffc0a4f3cd <afduhRemove+221>:   jmp    0xffffffffc0a4f363 <afduhRemove+115>
crash>


crash> sym afduhRemove
ffffffffc0a4f2f0 (t) afduhRemove [oracleafd]  <<------
                          ^             ^
                          |             |
[ Function within the module code ]  [ Module Name ]


Third-party modules:

  • Details of the third-party module: [oracleafd].
crash> mod -t | grep -i oracleafd
oracleafd   cwn   <------
crash> 

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments