Server crash at get_partial_node with kernel panic general protection fault.
Environment
- Red Hat Enterprise Linux 8.
Issue
- Server crash at
get_partial_node
with kernel panic general protection fault.
PID: 0 TASK: ffff96555a454040 CPU: 37 COMMAND: "swapper/37"
#0 [ffffb2c5993ac668] machine_kexec at ffffffffb0a650ce
#1 [ffffb2c5993ac6c0] __crash_kexec at ffffffffb0ba53dd
#2 [ffffb2c5993ac788] crash_kexec at ffffffffb0ba62cd
#3 [ffffb2c5993ac7a0] oops_end at ffffffffb0a264cd
#4 [ffffb2c5993ac7c0] general_protection at ffffffffb14010ee
[exception RIP: get_partial_node+134] <<===
RIP: ffffffffb0d0aaf6 RSP: ffffb2c5993ac870 RFLAGS: 00010002
RAX: 2255532048424102 RBX: ffff96554125b6c0 RCX: 2255532048424103
RDX: 0000000000000001 RSI: 0000000000980000 RDI: ffff96554125b6c0
RBP: ffffb2c5993ac920 R8: 22555320484240fb R9: 0000000000000000
R10: ffff965560a84840 R11: fffff5b3f906d308 R12: fffff5b3f906d300
R13: ffff965560a84840 R14: 0000000000000002 R15: 22555320484240fb
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#5 [ffffb2c5993ac928] ___slab_alloc at ffffffffb0d0b1e8
#6 [ffffb2c5993ac9d8] __alloc_skb at ffffffffb11c3847
#7 [ffffb2c5993aca40] __alloc_skb at ffffffffb11c3847
#8 [ffffb2c5993aca88] __tcp_send_ack at ffffffffb1299d2f
#9 [ffffb2c5993acaa0] tcp_rcv_established at ffffffffb1294cee
#10 [ffffb2c5993acae0] tcp_v4_do_rcv at ffffffffb12a13ba
#11 [ffffb2c5993acb00] tcp_v4_rcv at ffffffffb12a3723
#12 [ffffb2c5993acb70] ip_protocol_deliver_rcu at ffffffffb1276c3c
#13 [ffffb2c5993acb90] ip_local_deliver_finish at ffffffffb1276e2d
#14 [ffffb2c5993acb98] ip_local_deliver at ffffffffb1276f20
#15 [ffffb2c5993acbe8] ip_rcv at ffffffffb12771ab
#16 [ffffb2c5993acc48] __netif_receive_skb_core at ffffffffb11e0610
#17 [ffffb2c5993acce0] netif_receive_skb_internal at ffffffffb11e07bd
#18 [ffffb2c5993acd08] napi_gro_receive at ffffffffb11e1268
#19 [ffffb2c5993acd28] bnxt_rx_pkt at ffffffffc0229918 [bnxt_en]
#20 [ffffb2c5993ace08] __bnxt_poll_work at ffffffffc022a799 [bnxt_en]
#21 [ffffb2c5993ace50] bnxt_poll at ffffffffc022a9a2 [bnxt_en]
#22 [ffffb2c5993acea8] __napi_poll at ffffffffb11e1b5d
#23 [ffffb2c5993aced8] net_rx_action at ffffffffb11e2033
#24 [ffffb2c5993acf58] __softirqentry_text_start at ffffffffb16000d7
#25 [ffffb2c5993acfa8] irq_exit_rcu at ffffffffb0af3f3b
#26 [ffffb2c5993acfb8] irq_exit at ffffffffb0af3f4a
#27 [ffffb2c5993acfc0] do_IRQ at ffffffffb1401e7f
--- <IRQ stack> ---
#28 [ffffb2c598acfdf8] ret_from_intr at ffffffffb1400a8f
[exception RIP: mwait_idle+97]
RIP: ffffffffb13a5d61 RSP: ffffb2c598acfea0 RFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000025 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000025 RDI: ffff97ce7915d5c0
RBP: 0000000000000025 R8: ffff97ce7915cf80 R9: 000000000002a680
R10: 00197cc210ca11a6 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffffffffffffffff R15: ffff96555a454040
ORIG_RAX: ffffffffffffffd9 CS: 0010 SS: 0018
#29 [ffffb2c598acfea0] default_idle_call at ffffffffb13a5f90
#30 [ffffb2c598acfec0] do_idle at ffffffffb0b2583f
#31 [ffffb2c598acff10] cpu_startup_entry at ffffffffb0b25abf
#32 [ffffb2c598acff30] start_secondary at ffffffffb0a59a46
#33 [ffffb2c598acff50] secondary_startup_64_no_verify at ffffffffb0a00107
Resolution
- Engage your application vendor, to investigate the vmcore and why the sockets became unconnected while connected to
vertica
.
Root Cause
- The connection is having issues with the application
vertica
(sockets became unconnected while connected tovertica
).
Diagnostic Steps
Step: 1
- System Information.
CPUS: 80
DATE: Sat Sep 10 02:27:28 EDT 2022
UPTIME: 26 days, 20:18:31
LOAD AVERAGE: 12.58, 17.59, 20.17 <--- No Load
TASKS: 2366
RELEASE: 4.18.0-372.13.1.el8_6.x86_64
VERSION: #1 SMP Mon Jun 6 15:05:22 EDT 2022
MACHINE: x86_64 (3100 Mhz)
MEMORY: 1535.6 GB
PANIC: "general protection fault: 0000 [#1] SMP NOPTI"
PID: 0
COMMAND: "swapper/37"
TASK: ffff96555a454040 (1 of 80) [THREAD_INFO: ffff96555a454040]
CPU: 37
STATE: TASK_RUNNING (PANIC)
- System Hardware Information.
crash> sys -i
DMI_BIOS_VENDOR: HPE
DMI_BIOS_VERSION: U30
DMI_BIOS_DATE: 03/08/2022
DMI_SYS_VENDOR: HPE
DMI_PRODUCT_NAME: ProLiant DL380 Gen10
- Listing Unsigned Modules.
crash> mod -t
NAME TAINTS
traps OE
sisevt PE
sisap POE
gc_enforcement OE
- Kernel Ring Buffer.
crash> log
[1815371.377394] ------------[ cut here ]------------
[1815371.377400] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg
[1815371.377407] WARNING: CPU: 0 PID: 0 at kernel/sched/fair.c:3350 update_blocked_averages+0x6af/0x6e0
[1815371.377416] Modules linked in: sisap(POE) sisevt(PE) binfmt_misc nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache gc_enforc>
[1815371.377488] i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
---trimmed---
[2319476.560538] general protection fault: 0000 [#1] SMP NOPTI
[2319476.560586] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: P W OE --------- - - 4.18.0-372.13.1.el8_6.x86_64 #1
[2319476.560635] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 03/08/2022
[2319476.560664] RIP: 0010:get_partial_node.part.87+0x86/0x280
[2319476.560695] Code: 8d 41 f8 4c 8b 54 24 28 4d 8d 63 f8 c7 44 24 34 00 00 00 00 4d 89 f9 4d 89 c7 49 8b 54 24 08 48 8d 42 ff 83 e2 01 49 0f 44 c4 <48> 8b 00 a8 40 0f 85 ab 01 00 00 4d 8>
[2319476.562846] RSP: 0018:ffffb2c5993ac870 EFLAGS: 00010002
[2319476.563790] RAX: 2255532048424102 RBX: ffff96554125b6c0 RCX: 2255532048424103
[2319476.564706] RDX: 0000000000000001 RSI: 0000000000980000 RDI: ffff96554125b6c0
[2319476.565651] RBP: ffffb2c5993ac920 R08: 22555320484240fb R09: 0000000000000000
[2319476.566548] R10: ffff965560a84840 R11: fffff5b3f906d308 R12: fffff5b3f906d300
[2319476.567416] R13: ffff965560a84840 R14: 0000000000000002 R15: 22555320484240fb
[2319476.568272] FS: 0000000000000000(0000) GS:ffff97ce79140000(0000) knlGS:0000000000000000
[2319476.569154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2319476.570007] CR2: 00007ea78d19b000 CR3: 000000728b410006 CR4: 00000000007706e0
[2319476.570836] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2319476.571665] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[2319476.572473] PKRU: 55555554
[2319476.573261] Call Trace:
[2319476.574036] <IRQ>
[2319476.574806] ? __kfree_skb+0xe/0x20
[2319476.575598] ___slab_alloc.part.89+0x4f8/0x740
[2319476.576347] ? __alloc_skb+0x57/0x1c0
[2319476.577088] ? __alloc_skb+0x57/0x1c0
[2319476.577814] kmem_cache_alloc_node+0x25d/0x2a0
[2319476.578530] __alloc_skb+0x57/0x1c0
[2319476.579233] __tcp_send_ack.part.52+0x2f/0x100
[2319476.579931] tcp_rcv_established+0x4fe/0x5c0
[2319476.580618] tcp_v4_do_rcv+0x12a/0x1e0
[2319476.581327] tcp_v4_rcv+0xb43/0xc50
[2319476.581992] ? nft_do_chain_ipv4+0x66/0x80 [nf_tables]
[2319476.582675] ? endr_ioctl_enf+0xab1/0x1260 [gc_enforcement]
[2319476.583330] ip_protocol_deliver_rcu+0x2c/0x1d0
[2319476.583975] ip_local_deliver_finish+0x4d/0x60
[2319476.584604] ip_local_deliver+0xe0/0xf0
[2319476.585219] ? ip_protocol_deliver_rcu+0x1d0/0x1d0
[2319476.585845] ip_rcv+0x27b/0x36f
[2319476.586446] __netif_receive_skb_core+0xba0/0xcb0
[2319476.587045] ? inet_gro_receive+0x21f/0x2c0
[2319476.587663] netif_receive_skb_internal+0x3d/0xb0
[2319476.588241] napi_gro_receive+0x108/0x150
[2319476.588811] bnxt_rx_pkt+0xe88/0x12a0 [bnxt_en]
[2319476.589392] __bnxt_poll_work+0x179/0x2c0 [bnxt_en]
[2319476.589977] bnxt_poll+0xc2/0x1a0 [bnxt_en]
[2319476.590521] __napi_poll+0x2d/0x130
[2319476.591053] net_rx_action+0x253/0x320
[2319476.591594] __do_softirq+0xd7/0x2c4
[2319476.592127] irq_exit_rcu+0xcb/0xd0
[2319476.592621] irq_exit+0xa/0x10
[2319476.593093] do_IRQ+0x7f/0xd0
[2319476.593552] common_interrupt+0xf/0xf
[2319476.594024] </IRQ>
[2319476.594471] RIP: 0010:mwait_idle+0x61/0x80
[2319476.594896] Code: 48 8b 04 25 40 5c 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 17 e9 07 00 00 00 0f 00 2d ea d5 45 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 07 fb 66 0f 1f 44 00 00 65 48 8b 0>
[2319476.595777] RSP: 0018:ffffb2c598acfea0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
[2319476.596243] RAX: 0000000000000000 RBX: 0000000000000025 RCX: 0000000000000000
[2319476.596693] RDX: 0000000000000000 RSI: 0000000000000025 RDI: ffff97ce7915d5c0
[2319476.597131] RBP: 0000000000000025 R08: ffff97ce7915cf80 R09: 000000000002a680
[2319476.597581] R10: 00197cc210ca11a6 R11: 0000000000000000 R12: 0000000000000000
[2319476.598013] R13: 0000000000000000 R14: ffffffffffffffff R15: ffff96555a454040
[2319476.598445] default_idle_call+0x40/0xf0
[2319476.598887] do_idle+0x20f/0x2c0
[2319476.599309] cpu_startup_entry+0x6f/0x80
[2319476.599749] start_secondary+0x1a6/0x1e0
[2319476.600170] secondary_startup_64_no_verify+0xc2/0xcb
[2319476.600596] Modules linked in: traps(OE) sisap(POE) sisevt(PE) binfmt_misc nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache>
[2319476.600666] i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse [last unloaded: traps]
- Most of the load is coming from runnable tasks.
crash> ps -S
RU: 103
IN: 1777
ID: 485
ZO: 1
485 idle
1 zombie
1777 interruptible.
103 running
So there aren't any stuck processes except for the zombie process.
Step: 2
- The panic occurred in the
CPU 37 IRQ
stack.
crash> set -p
PID: 0
COMMAND: "swapper/37"
TASK: ffff96555a454040 (1 of 80) [THREAD_INFO: ffff96555a454040]
CPU: 37
STATE: TASK_RUNNING (PANIC)
- Backtrace of the panic task.
The first backtrace lands on this exception
--> [exception RIP: get_partial_node+0x86]
crash> bt
PID: 0 TASK: ffff96555a454040 CPU: 37 COMMAND: "swapper/37"
#0 [ffffb2c5993ac668] machine_kexec at ffffffffb0a650ce
#1 [ffffb2c5993ac6c0] __crash_kexec at ffffffffb0ba53dd
#2 [ffffb2c5993ac788] crash_kexec at ffffffffb0ba62cd
#3 [ffffb2c5993ac7a0] oops_end at ffffffffb0a264cd
#4 [ffffb2c5993ac7c0] general_protection at ffffffffb14010ee
[exception RIP: get_partial_node+134] <<===
RIP: ffffffffb0d0aaf6 RSP: ffffb2c5993ac870 RFLAGS: 00010002
RAX: 2255532048424102 RBX: ffff96554125b6c0 RCX: 2255532048424103
RDX: 0000000000000001 RSI: 0000000000980000 RDI: ffff96554125b6c0
RBP: ffffb2c5993ac920 R8: 22555320484240fb R9: 0000000000000000
R10: ffff965560a84840 R11: fffff5b3f906d308 R12: fffff5b3f906d300
R13: ffff965560a84840 R14: 0000000000000002 R15: 22555320484240fb
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#5 [ffffb2c5993ac928] ___slab_alloc at ffffffffb0d0b1e8
#6 [ffffb2c5993ac9d8] __alloc_skb at ffffffffb11c3847 <<----
#7 [ffffb2c5993aca40] __alloc_skb at ffffffffb11c3847
#8 [ffffb2c5993aca88] __tcp_send_ack at ffffffffb1299d2f
#9 [ffffb2c5993acaa0] tcp_rcv_established at ffffffffb1294cee
#10 [ffffb2c5993acae0] tcp_v4_do_rcv at ffffffffb12a13ba
#11 [ffffb2c5993acb00] tcp_v4_rcv at ffffffffb12a3723
#12 [ffffb2c5993acb70] ip_protocol_deliver_rcu at ffffffffb1276c3c
#13 [ffffb2c5993acb90] ip_local_deliver_finish at ffffffffb1276e2d
#14 [ffffb2c5993acb98] ip_local_deliver at ffffffffb1276f20
#15 [ffffb2c5993acbe8] ip_rcv at ffffffffb12771ab
#16 [ffffb2c5993acc48] __netif_receive_skb_core at ffffffffb11e0610
#17 [ffffb2c5993acce0] netif_receive_skb_internal at ffffffffb11e07bd
#18 [ffffb2c5993acd08] napi_gro_receive at ffffffffb11e1268
#19 [ffffb2c5993acd28] bnxt_rx_pkt at ffffffffc0229918 [bnxt_en]
#20 [ffffb2c5993ace08] __bnxt_poll_work at ffffffffc022a799 [bnxt_en]
#21 [ffffb2c5993ace50] bnxt_poll at ffffffffc022a9a2 [bnxt_en]
#22 [ffffb2c5993acea8] __napi_poll at ffffffffb11e1b5d
#23 [ffffb2c5993aced8] net_rx_action at ffffffffb11e2033
#24 [ffffb2c5993acf58] __softirqentry_text_start at ffffffffb16000d7
#25 [ffffb2c5993acfa8] irq_exit_rcu at ffffffffb0af3f3b
#26 [ffffb2c5993acfb8] irq_exit at ffffffffb0af3f4a
#27 [ffffb2c5993acfc0] do_IRQ at ffffffffb1401e7f
--- <IRQ stack> ---
#28 [ffffb2c598acfdf8] ret_from_intr at ffffffffb1400a8f
[exception RIP: mwait_idle+97]
RIP: ffffffffb13a5d61 RSP: ffffb2c598acfea0 RFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000025 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000025 RDI: ffff97ce7915d5c0
RBP: 0000000000000025 R8: ffff97ce7915cf80 R9: 000000000002a680
R10: 00197cc210ca11a6 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffffffffffffffff R15: ffff96555a454040
ORIG_RAX: ffffffffffffffd9 CS: 0010 SS: 0018
#29 [ffffb2c598acfea0] default_idle_call at ffffffffb13a5f90
#30 [ffffb2c598acfec0] do_idle at ffffffffb0b2583f
#31 [ffffb2c598acff10] cpu_startup_entry at ffffffffb0b25abf
#32 [ffffb2c598acff30] start_secondary at ffffffffb0a59a46
#33 [ffffb2c598acff50] secondary_startup_64_no_verify at ffffffffb0a00107
- We crashed due to dereferencing
%rax
which contains gibberish.
crash> dis get_partial_node+0x86 -r 3
0xffffffffb0d0aaef <get_partial_node+0x7f>: and $0x1,%edx
0xffffffffb0d0aaf2 <get_partial_node+0x82>: cmove %r12,%rax
0xffffffffb0d0aaf6 <get_partial_node+0x86>: mov (%rax),%rax
RAX: 2255532048424102
Step:3
- In the source, we panicked while iterating over a list.
2044 static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
2045 struct page **ret_page, gfp_t gfpflags)
2046 {
2047 struct page *page, *page2;
2048 void *object = NULL;
2049 unsigned int available = 0;
2050 unsigned long flags;
2051 int objects;
2052
2053 /*
2054 * Racy check. If we mistakenly see no partial slabs then we
2055 * just allocate an empty slab. If we mistakenly try to get a
2056 * partial slab and there is none available then get_partials()
2057 * will return NULL.
2058 */
2059 if (!n || !n->nr_partial)
2060 return NULL;
2061
2062 spin_lock_irqsave(&n->list_lock, flags);
2063 list_for_each_entry_safe(page, page2, &n->partial, slab_list) { > - - - -
`- - - >/* panic'd here
- The list head is derived from
n->partial
(2nd arg to get_partial_node). We'll need to find this argument.
crash> whatis get_partial_node
void *get_partial_node(struct kmem_cache *, struct kmem_cache_node *, struct page **, gfp_t);
%rdi %rsi %rcx %rdx
crash> dis -r get_partial_node+0x86
0xffffffffb0d0aa70 <get_partial_node>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffb0d0aa75 <get_partial_node+0x5>: push %rbp
0xffffffffb0d0aa76 <get_partial_node+0x6>: mov %rsp,%rbp
0xffffffffb0d0aa79 <get_partial_node+0x9>: push %r15
0xffffffffb0d0aa7b <get_partial_node+0xb>: push %r14
0xffffffffb0d0aa7d <get_partial_node+0xd>: push %r13
0xffffffffb0d0aa7f <get_partial_node+0xf>: push %r12
0xffffffffb0d0aa81 <get_partial_node+0x11>: push %rbx
0xffffffffb0d0aa82 <get_partial_node+0x12>: mov %rsi,%rbx /* %rsi -> %rbx
0xffffffffb0d0aa85 <get_partial_node+0x15>: and $0xfffffffffffffff0,%rsp
0xffffffffb0d0aa89 <get_partial_node+0x19>: add $0xffffffffffffff80,%rsp
0xffffffffb0d0aa8d <get_partial_node+0x1d>: mov %rdi,0x28(%rsp)
0xffffffffb0d0aa92 <get_partial_node+0x22>: mov %rsi,%rdi /* %rsi -> %rdi
0xffffffffb0d0aa95 <get_partial_node+0x25>: mov %ecx,0x30(%rsp)
0xffffffffb0d0aa99 <get_partial_node+0x29>: mov %rsi,0x10(%rsp)
0xffffffffb0d0aa9e <get_partial_node+0x2e>: mov %rdx,0x8(%rsp)
0xffffffffb0d0aaa3 <get_partial_node+0x33>: call 0xffffffffb13a62b0 <_raw_spin_lock_irqsave>
0xffffffffb0d0aaa8 <get_partial_node+0x38>: mov 0x10(%rbx),%r11
0xffffffffb0d0aaac <get_partial_node+0x3c>: mov %rax,(%rsp)
0xffffffffb0d0aab0 <get_partial_node+0x40>: mov %rbx,%rax
0xffffffffb0d0aab3 <get_partial_node+0x43>: add $0x10,%rax
0xffffffffb0d0aab7 <get_partial_node+0x47>: mov (%r11),%rcx
0xffffffffb0d0aaba <get_partial_node+0x4a>: mov %rax,0x38(%rsp)
0xffffffffb0d0aabf <get_partial_node+0x4f>: cmp %r11,%rax
0xffffffffb0d0aac2 <get_partial_node+0x52>: je 0xffffffffb0d0ace6 <get_partial_node+0x276>
0xffffffffb0d0aac8 <get_partial_node+0x58>: xor %r15d,%r15d
0xffffffffb0d0aacb <get_partial_node+0x5b>: lea -0x8(%rcx),%r8
0xffffffffb0d0aacf <get_partial_node+0x5f>: mov 0x28(%rsp),%r10
0xffffffffb0d0aad4 <get_partial_node+0x64>: lea -0x8(%r11),%r12
0xffffffffb0d0aad8 <get_partial_node+0x68>: movl $0x0,0x34(%rsp)
0xffffffffb0d0aae0 <get_partial_node+0x70>: mov %r15,%r9
0xffffffffb0d0aae3 <get_partial_node+0x73>: mov %r8,%r15
0xffffffffb0d0aae6 <get_partial_node+0x76>: mov 0x8(%r12),%rdx
0xffffffffb0d0aaeb <get_partial_node+0x7b>: lea -0x1(%rdx),%rax
0xffffffffb0d0aaef <get_partial_node+0x7f>: and $0x1,%edx
0xffffffffb0d0aaf2 <get_partial_node+0x82>: cmove %r12,%rax
0xffffffffb0d0aaf6 <get_partial_node+0x86>: mov (%rax),%rax
- These two registers haven't been overwritten and contain the
kmem_cache_node
.
RBX: ffff96554125b6c0
RDI: ffff96554125b6c0
- Here's struct
kmem_cache_node
.
crash> struct kmem_cache_node ffff96554125b6c0
struct kmem_cache_node {
list_lock = {
{
rlock = {
raw_lock = {
{
val = {
counter = 0x580001
},
{
locked = 0x1,
pending = 0x0
},
{
locked_pending = 0x1,
tail = 0x58
}
}
}
}
}
},
nr_partial = 0x8,
partial = {
next = 0xfffff5b3f906d308,
prev = 0xfffff5b63dfb1408
},
nr_slabs = {
counter = 0xd
},
total_objects = {
counter = 0x222
},
full = {
next = 0xffff96554125b6f0,
prev = 0xffff96554125b6f0
}
}
Step:4
- We are iterating over the embedded list (list head n above). Each entry is a struct
page.slab_list
.
2063 list_for_each_entry_safe(page, page2, &n->partial, slab_list) {
crash> struct kmem_cache_node ffff96554125b6c0 -o
struct kmem_cache_node {
[ffff96554125b6c0] spinlock_t list_lock;
[ffff96554125b6c8] unsigned long nr_partial;
[ffff96554125b6d0] struct list_head partial;
[ffff96554125b6e0] atomic_long_t nr_slabs;
[ffff96554125b6e8] atomic_long_t total_objects;
[ffff96554125b6f0] struct list_head full;
}
SIZE: 0x40
- We can see the next entry is bad.
crash> list -H ffff96554125b6d0
fffff5b3f906d308
2255532048424103
list: invalid kernel virtual address: 2255532048424103 type: "list entry"
- In the reverse order same thing.
crash> list -H ffff96554125b6d0 -r
fffff5b63dfb1408
fffff5b37a877608
fffff5b1a860eb08
fffff5b68a86dd08
fffff5b14ac48008
fffff5b144f3eb08
fffff5b49a42e008
fffff5b3f906d308
61e5583ce4242c38
list: invalid kernel virtual address: 61e5583ce4242c40 type: "list entry"
- This list is embedded in a struct page at the
page.slab_list (0x8)
. The problem is entryfffff5b3f906d308
(struct page fffff5b3f906d300) as next (forward) and prev (reverse) are both bad.kmem
also corroborates this.
crash> kmem -s >/dev/null
kmem: skbuff_head_cache: partial list slab: fffff5b3f906d300 invalid page.lru.next: 2255532048424103
- The page struct is toast.
crash> struct page fffff5b3f906d300
struct page {
flags = 0x87701336b5200d85,
- - - - - - 8< - - - - - - -
{
{
slab_list = {
next = 0x2255532048424103,
prev = 0x61e5583ce4242c38
},
{
next = 0x2255532048424103,
pages = 0xe4242c38,
pobjects = 0x61e5583c
}
},
slab_cache = 0x84097c1795955408,
freelist = 0x5547494e075fb057,
{
s_mem = 0x6f22551352414b44,
counters = 0x6f22551352414b44,
{
inuse = 0x4b44,
objects = 0x5241,
frozen = 0x0
}
}
},
crash> struct page | grep SIZE
SIZE: 0x40
crash> p (0xfffff5b3f906d300 - 0x40)
$6 = 0xfffff5b3f906d2c0
- The page struct before and after look ok.
crash> rd 0xfffff5b3f906d2c0 24
fffff5b3f906d2c0: 0017ffffc0000000 dead000000000100 ................
fffff5b3f906d2d0: dead000000000200 0000000000000000 ................
fffff5b3f906d2e0: 0000000000000000 0000000000000000 ................
fffff5b3f906d2f0: 00000001fffffdff ffff97a5e4caa00a ................
fffff5b3f906d300: 87701336b5200d85 2255532048424103 .. .6.p..ABH SU"
fffff5b3f906d310: 61e5583ce4242c38 84097c1795955408 8,$.<X.a.T...|..
fffff5b3f906d320: 5547494e075fb057 6f22551352414b44 W._.NIGUDKAR.U"o
fffff5b3f906d330: 22536e8d20414e5d b022fefc165445fa ]NA .nS".ET...".
fffff5b3f906d340: 0017ffffc0000000 fffff5b3f906d301 ................
fffff5b3f906d350: fffffffff9060201 dead000000000400 ................
fffff5b3f906d360: 0000000000000000 0000000000000000 ................
fffff5b3f906d370: 00000000ffffffff 0000000000000000 ................
Step: 5
- Searching for the page struct and we see 7 active tasks have it on their stacks. Quite a few inactive tasks as well.
crash> search -T fffff5b3f906d300 | grep -c COMM
7
crash> search -t fffff5b3f906d300 | grep -c COMM
169
- Looking at the active CPUs there are at least 20 in
spin_lock
called fromslab free/alloc
. All are networking-related stacks and
many are calledskb_clone, alloc_skb
, etc. Here's an example with the page on its stack:
crash> bt ffff9655bcd90040
PID: 3176112 TASK: ffff9655bcd90040 CPU: 20 COMMAND: "vertica"
#0 [fffffe0000405e48] crash_nmi_callback at ffffffffb0a57853
#1 [fffffe0000405e50] nmi_handle at ffffffffb0a26e03
#2 [fffffe0000405ea8] default_do_nmi at ffffffffb1393619
#3 [fffffe0000405ec8] do_nmi at ffffffffb0a2733f
#4 [fffffe0000405ef0] end_repeat_nmi at ffffffffb14015a4
[exception RIP: native_queued_spin_lock_slowpath+0x13c]
RIP: ffffffffb0b4e54c RSP: ffffb2c59afc7ae8 RFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000246 RCX: 0000000000000029
RDX: ffff97ce78d2bb80 RSI: 0000000000540000 RDI: ffff96554125b6c0
RBP: ffffb2c59afc7be8 R8: 0000000000000001 R9: ffffffffb1289585
R10: ffff972ad0b81840 R11: 0000000000000001 R12: ffff972ad0b81840
R13: ffff96554125b6c0 R14: ffff972ad0b81840 R15: ffff965560a84840
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ffffb2c59afc7ae8] native_queued_spin_lock_slowpath at ffffffffb0b4e54c
#6 [ffffb2c59afc7ae8] _raw_spin_lock_irqsave at ffffffffb13a62e0
#7 [ffffb2c59afc7af8] free_debug_processing at ffffffffb0d0975f
#8 [ffffb2c59afc7b48] __slab_free at ffffffffb0d09b43 <<==
#9 [ffffb2c59afc7bd8] tcp_recvmsg at ffffffffb1289585
#10 [ffffb2c59afc7ce8] inet_recvmsg at ffffffffb12bd5db
#11 [ffffb2c59afc7d08] sock_read_iter at ffffffffb11b79b2
#12 [ffffb2c59afc7d78] do_iter_readv_writev at ffffffffb0d3d468
#13 [ffffb2c59afc7dc0] do_iter_read at ffffffffb0d3e26f
#14 [ffffb2c59afc7df8] vfs_readv at ffffffffb0d40a5d
#15 [ffffb2c59afc7ef8] do_readv at ffffffffb0d40afc
#16 [ffffb2c59afc7f38] do_syscall_64 at ffffffffb0a0430b
#17 [ffffb2c59afc7f50] entry_SYSCALL_64_after_hwframe at ffffffffb14000ad
RIP: 00007f1e7690565f RSP: 00007f15417f8040 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 000000000000007c RCX: 00007f1e7690565f
RDX: 0000000000000003 RSI: 00007ef23c5719f0 RDI: 000000000000007c
RBP: 00007ef23c5719f0 R8: 0000000000000000 R9: 0000000011ac899e
R10: 00007ef23c571b48 R11: 0000000000000293 R12: 0000000000000003
R13: 0000000000000003 R14: 00007ef23c5719f0 R15: 000000000000000b
ORIG_RAX: 0000000000000013 CS: 0033 SS: 002b
-
I searched instead for one of the bad values that overwrote the struct page.
-
There are only four results. The last two are the page struct itself and an alias.
crash> search 87701336b5200d85
ffff97ce7980bd70: 87701336b5200d85 \ addresses alias to CPU 64 NMI stack
ffff97ce7980bf48: 87701336b5200d85 /
ffff97d13186d300: 87701336b5200d85 alias to struct page
fffff5b3f906d300: 87701336b5200d85 struct page
-
The first two are aliases for the
CPU 64
NMI stack. -
The search result.
crash> vtop ffff97ce7980bd70 | head -2
VIRTUAL PHYSICAL
fff97ce7980bd70 17a3980bd70
v
'- - - ->/ note the physical address
- Here are
CPU 64
stacks from help.
crash> help -m | awk '/ebase/,/ibase/' | grep '\['64 -A1
[64]: fffffe0000cc7000 fffffe0000cc9000 fffffe0000ccb000 fffffe0000ccd000
fffffe0000ccf000 0000000000000000
v
'- - - - >/* NMI stack for CPU 64 (0x1000 bytes)
- The NMI base plus
0xd70
(at least three hex values of the search value) have the same physical address.
crash> vtop fffffe0000cc9d70 | head -2
VIRTUAL PHYSICAL
ffffe0000cc9d70 17a3980bd70
- But shows a few of the bad values were saved to the stack by the NMI. They were in various registers (%rax, %rcx, etc). This is the only
task that isn't on a spinlock fromslab_[alloc/free]
. Either this process is hitting the same corruption or it may be the cause (race
between alloc/free).
crash> bt -c64
PID: 3831542 TASK: ffff96b093fe4040 CPU: 64 COMMAND: "vertica"
#0 [fffffe0000cc9e48] crash_nmi_callback at ffffffffb0a57853
#1 [fffffe0000cc9e50] nmi_handle at ffffffffb0a26e03
#2 [fffffe0000cc9ea8] default_do_nmi at ffffffffb1393619
#3 [fffffe0000cc9ec8] do_nmi at ffffffffb0a2733f
#4 [fffffe0000cc9ef0] end_repeat_nmi at ffffffffb14015a4
[exception RIP: __slab_free+0x209]
RIP: ffffffffb0d09b59 RSP: ffffb2c59c1376c0 RFLAGS: 00000002
RAX: 87701336b5200d85 RBX: ffff970281b4c340 RCX: 6f22551352414b43
RDX: 6f22551352414b44 RSI: 0000000052414b43 RDI: 0000000000000246
RBP: ffffb2c59c137758 R8: 0000000000000000 R9: ffff970281b4c400
R10: ffff970281b4c340 R11: 0000000000000001 R12: fffff5b3f906d300
R13: 5547494e075fb057 R14: ffff965560a84840 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ffffb2c59c1376c0] __slab_free at ffffffffb0d09b59
#6 [ffffb2c59c137748] tpacket_rcv at ffffffffb1356600 /* small note, the stacks are not accurate
#7 [ffffb2c59c1377e8] kmem_cache_alloc at ffffffffb0d0c52f /* kmem_cache_alloc does not call tpacket_rcv....
#8 [ffffb2c59c137848] dev_queue_xmit_nit at ffffffffb11da106
#9 [ffffb2c59c137888] dev_hard_start_xmit at ffffffffb11de6ea
#10 [ffffb2c59c1378d8] __dev_queue_xmit at ffffffffb11df26b
#11 [ffffb2c59c137958] ip_finish_output2 at ffffffffb127ae29
#12 [ffffb2c59c1379b0] ip_output at ffffffffb127c7f0
#13 [ffffb2c59c137a08] __ip_queue_xmit at ffffffffb127c1fd
#14 [ffffb2c59c137a60] __tcp_transmit_skb at ffffffffb1298dbd
#15 [ffffb2c59c137b70] tcp_write_xmit at ffffffffb129a2ad
#16 [ffffb2c59c137bd8] __tcp_push_pending_frames at ffffffffb129b162
#17 [ffffb2c59c137bf0] tcp_sendmsg_locked at ffffffffb1286eea
#18 [ffffb2c59c137c98] tcp_sendmsg at ffffffffb1287057
#19 [ffffb2c59c137cb8] sock_sendmsg at ffffffffb11b752e
#20 [ffffb2c59c137cd0] sock_write_iter at ffffffffb11b75d7
#21 [ffffb2c59c137d40] do_iter_readv_writev at ffffffffb0d3d436
#22 [ffffb2c59c137d88] do_iter_write at ffffffffb0d3e7d0
#23 [ffffb2c59c137db8] vfs_writev at ffffffffb0d3e9a3
#24 [ffffb2c59c137eb0] do_writev at ffffffffb0d3ea5c
#25 [ffffb2c59c137ef0] sisevt_hook_sys_writev at ffffffffc0670779 [sisevt]
#26 [ffffb2c59c137f30] sisevt64_sys_writev at ffffffffc0675d8b [sisevt]
#27 [ffffb2c59c137f38] do_syscall_64 at ffffffffb0a0430b
#28 [ffffb2c59c137f50] entry_SYSCALL_64_after_hwframe at ffffffffb14000ad
RIP: 00007f1e7690570f RSP: 00007e9e3faeabd0 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 0000000000000083 RCX: 00007f1e7690570f
RDX: 0000000000000002 RSI: 00007ee148051b50 RDI: 0000000000000083
RBP: 00007ee148051b50 R8: 0000000000000000 R9: 00007ee148051b50
R10: 0000000000000083 R11: 0000000000000293 R12: 0000000000000002
R13: 00007ef23c344240 R14: 00007e9e3faeb140 R15: 00007ef23c344240
ORIG_RAX: 0000000000000014 CS: 0033 SS: 002b
Step: 6
- The server has only a bond with an active IP address.
net
NET_DEVICE NAME IP ADDRESS(ES)
ffff96562047c000 lo 127.0.0.1
ffff96568247c000 ens1f0np0
ffff9657a1164000 ens1f1np1
ffff965741db0000 ens4f0np0
ffff965800170000 ens4f1np1
ffff9657a08a4000 eno5
ffff9657416f0000 eno6
ffff9656e2004000 eno7
ffff96568425c000 eno8
ffff96556c50a000 bond0 10.52.25.77
- If we investigate the PID mentioned in the above analysis
3176112
.
net -s 3176112
PID: 3176112 TASK: ffff9655bcd90040 CPU: 20 COMMAND: "vertica"
FD SOCKET SOCKFAMILY:TYPE SOURCE-PORT DESTINATION-PORT
3 ffff97150e10b100 ffff96bf302d3240 INET:STREAM 0.0.0.0-6440 0.0.0.0-0
5 ffff973a4d0be500 ffff97cd2cba9940 UNIX:STREAM
9 ffff965e60e0de80 ffff972295605540 UNIX:STREAM
10 ffff97150e1093c0 ffff96bf302d6440 INET:STREAM 0.0.0.0-6439 0.0.0.0-0
11 ffff97150e10f200 ffff96882c8ccb80 INET6:STREAM
16 ffff97150e10c7c0 ffff977a4ff42840 INET:STREAM 10.52.25.77-39620 10.50.72.109-9092
20 ffff97150e10a740 ffff9655c0f3a6c0 INET:DGRAM 10.52.25.77-6439 0.0.0.0-0
23 ffff965563e52400 ffff977a4ff43240 INET:STREAM 10.52.25.77-39624 10.50.72.109-9092
35 ffff965563e51a40 ffff977a4ff46440 INET:STREAM 10.52.25.77-39626 10.50.72.109-9092
--- trimmed---
2358 ffff970cc0759700 ffff96571083c640 INET:STREAM 10.52.25.77-6439 10.52.107.30-55018
3712 ffff970cc075eec0 ffff96571083a840 INET:STREAM 10.52.25.77-6439 10.50.101.139-47508
-
It seems that there are a lot of
connections
to10.50.72.109
frombond0
. -
The socket appears to be
unconnected
, perhaps at the time of the vmcore, the sockets gotdisconnected
.
net -S 3176112
PID: 3176112 TASK: ffff9655bcd90040 CPU: 20 COMMAND: "vertica" <<===
FDSOCKET SOCK
3 ffff97150e10b100 ffff96bf302d3240
struct socket {
state = SS_UNCONNECTED,
type = 0x1,
flags = 0x0,
wq = 0xffff96abcd7f7540,
file = 0xffff96fbf2a50ac0,
sk = 0xffff96bf302d3240,
ops = 0xffffffffb18fd3a0 <inet_stream_ops>
}
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments