Server crash at get_partial_node with kernel panic general protection fault.

Solution In Progress - Updated -

Environment

  • Red Hat Enterprise Linux 8.

Issue

  • Server crash at get_partial_node with kernel panic general protection fault.
PID: 0        TASK: ffff96555a454040  CPU: 37   COMMAND: "swapper/37"
#0 [ffffb2c5993ac668] machine_kexec at ffffffffb0a650ce
#1 [ffffb2c5993ac6c0] __crash_kexec at ffffffffb0ba53dd
#2 [ffffb2c5993ac788] crash_kexec at ffffffffb0ba62cd
#3 [ffffb2c5993ac7a0] oops_end at ffffffffb0a264cd
#4 [ffffb2c5993ac7c0] general_protection at ffffffffb14010ee
   [exception RIP: get_partial_node+134]  <<===
   RIP: ffffffffb0d0aaf6  RSP: ffffb2c5993ac870  RFLAGS: 00010002
   RAX: 2255532048424102  RBX: ffff96554125b6c0  RCX: 2255532048424103
   RDX: 0000000000000001  RSI: 0000000000980000  RDI: ffff96554125b6c0
   RBP: ffffb2c5993ac920   R8: 22555320484240fb   R9: 0000000000000000
   R10: ffff965560a84840  R11: fffff5b3f906d308  R12: fffff5b3f906d300
   R13: ffff965560a84840  R14: 0000000000000002  R15: 22555320484240fb
   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffb2c5993ac928] ___slab_alloc at ffffffffb0d0b1e8
 #6 [ffffb2c5993ac9d8] __alloc_skb at ffffffffb11c3847  
 #7 [ffffb2c5993aca40] __alloc_skb at ffffffffb11c3847
 #8 [ffffb2c5993aca88] __tcp_send_ack at ffffffffb1299d2f
 #9 [ffffb2c5993acaa0] tcp_rcv_established at ffffffffb1294cee
#10 [ffffb2c5993acae0] tcp_v4_do_rcv at ffffffffb12a13ba
#11 [ffffb2c5993acb00] tcp_v4_rcv at ffffffffb12a3723
#12 [ffffb2c5993acb70] ip_protocol_deliver_rcu at ffffffffb1276c3c
#13 [ffffb2c5993acb90] ip_local_deliver_finish at ffffffffb1276e2d
#14 [ffffb2c5993acb98] ip_local_deliver at ffffffffb1276f20
#15 [ffffb2c5993acbe8] ip_rcv at ffffffffb12771ab
#16 [ffffb2c5993acc48] __netif_receive_skb_core at ffffffffb11e0610
#17 [ffffb2c5993acce0] netif_receive_skb_internal at ffffffffb11e07bd
#18 [ffffb2c5993acd08] napi_gro_receive at ffffffffb11e1268
#19 [ffffb2c5993acd28] bnxt_rx_pkt at ffffffffc0229918 [bnxt_en]
#20 [ffffb2c5993ace08] __bnxt_poll_work at ffffffffc022a799 [bnxt_en]
#21 [ffffb2c5993ace50] bnxt_poll at ffffffffc022a9a2 [bnxt_en]
#22 [ffffb2c5993acea8] __napi_poll at ffffffffb11e1b5d
#23 [ffffb2c5993aced8] net_rx_action at ffffffffb11e2033
#24 [ffffb2c5993acf58] __softirqentry_text_start at ffffffffb16000d7
#25 [ffffb2c5993acfa8] irq_exit_rcu at ffffffffb0af3f3b
#26 [ffffb2c5993acfb8] irq_exit at ffffffffb0af3f4a
#27 [ffffb2c5993acfc0] do_IRQ at ffffffffb1401e7f
--- <IRQ stack> ---
#28 [ffffb2c598acfdf8] ret_from_intr at ffffffffb1400a8f
    [exception RIP: mwait_idle+97]
    RIP: ffffffffb13a5d61  RSP: ffffb2c598acfea0  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 0000000000000025  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000025  RDI: ffff97ce7915d5c0
    RBP: 0000000000000025   R8: ffff97ce7915cf80   R9: 000000000002a680
    R10: 00197cc210ca11a6  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: ffffffffffffffff  R15: ffff96555a454040
    ORIG_RAX: ffffffffffffffd9  CS: 0010  SS: 0018
#29 [ffffb2c598acfea0] default_idle_call at ffffffffb13a5f90
#30 [ffffb2c598acfec0] do_idle at ffffffffb0b2583f
#31 [ffffb2c598acff10] cpu_startup_entry at ffffffffb0b25abf
#32 [ffffb2c598acff30] start_secondary at ffffffffb0a59a46
#33 [ffffb2c598acff50] secondary_startup_64_no_verify at ffffffffb0a00107

Resolution

  • Engage your application vendor, to investigate the vmcore and why the sockets became unconnected while connected to vertica.

Root Cause

  • The connection is having issues with the application vertica (sockets became unconnected while connected to vertica).

Diagnostic Steps

Step: 1

  • System Information.
        CPUS: 80
        DATE: Sat Sep 10 02:27:28 EDT 2022
      UPTIME: 26 days, 20:18:31
LOAD AVERAGE: 12.58, 17.59, 20.17 <--- No Load
       TASKS: 2366
     RELEASE: 4.18.0-372.13.1.el8_6.x86_64
     VERSION: #1 SMP Mon Jun 6 15:05:22 EDT 2022
     MACHINE: x86_64  (3100 Mhz)
      MEMORY: 1535.6 GB
       PANIC: "general protection fault: 0000 [#1] SMP NOPTI"
         PID: 0
     COMMAND: "swapper/37"
        TASK: ffff96555a454040  (1 of 80)  [THREAD_INFO: ffff96555a454040]
         CPU: 37
       STATE: TASK_RUNNING (PANIC)
  • System Hardware Information.
crash> sys -i
DMI_BIOS_VENDOR: HPE
DMI_BIOS_VERSION: U30
DMI_BIOS_DATE: 03/08/2022
DMI_SYS_VENDOR: HPE
DMI_PRODUCT_NAME: ProLiant DL380 Gen10
  • Listing Unsigned Modules.
crash> mod -t
NAME     TAINTS
traps    OE
sisevt   PE
sisap    POE
gc_enforcement  OE  
  • Kernel Ring Buffer.
 crash> log 
 [1815371.377394] ------------[ cut here ]------------
 [1815371.377400] cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg
 [1815371.377407] WARNING: CPU: 0 PID: 0 at kernel/sched/fair.c:3350 update_blocked_averages+0x6af/0x6e0   
 [1815371.377416] Modules linked in: sisap(POE) sisevt(PE) binfmt_misc nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache gc_enforc>
 [1815371.377488]  i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
 ---trimmed---
 [2319476.560538] general protection fault: 0000 [#1] SMP NOPTI
 [2319476.560586] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: P W  OE    --------- -  - 4.18.0-372.13.1.el8_6.x86_64 #1
 [2319476.560635] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 03/08/2022
 [2319476.560664] RIP: 0010:get_partial_node.part.87+0x86/0x280
 [2319476.560695] Code: 8d 41 f8 4c 8b 54 24 28 4d 8d 63 f8 c7 44 24 34 00 00 00 00 4d 89 f9 4d 89 c7 49 8b 54 24 08 48 8d 42 ff 83 e2 01 49 0f 44 c4 <48> 8b 00 a8 40 0f 85 ab 01 00 00 4d 8>
 [2319476.562846] RSP: 0018:ffffb2c5993ac870 EFLAGS: 00010002
 [2319476.563790] RAX: 2255532048424102 RBX: ffff96554125b6c0 RCX: 2255532048424103
 [2319476.564706] RDX: 0000000000000001 RSI: 0000000000980000 RDI: ffff96554125b6c0
 [2319476.565651] RBP: ffffb2c5993ac920 R08: 22555320484240fb R09: 0000000000000000
 [2319476.566548] R10: ffff965560a84840 R11: fffff5b3f906d308 R12: fffff5b3f906d300
 [2319476.567416] R13: ffff965560a84840 R14: 0000000000000002 R15: 22555320484240fb
 [2319476.568272] FS:  0000000000000000(0000) GS:ffff97ce79140000(0000) knlGS:0000000000000000
 [2319476.569154] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [2319476.570007] CR2: 00007ea78d19b000 CR3: 000000728b410006 CR4: 00000000007706e0
 [2319476.570836] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [2319476.571665] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [2319476.572473] PKRU: 55555554
 [2319476.573261] Call Trace:
 [2319476.574036]  <IRQ>
 [2319476.574806]  ? __kfree_skb+0xe/0x20
 [2319476.575598]  ___slab_alloc.part.89+0x4f8/0x740
 [2319476.576347]  ? __alloc_skb+0x57/0x1c0
 [2319476.577088]  ? __alloc_skb+0x57/0x1c0
 [2319476.577814]  kmem_cache_alloc_node+0x25d/0x2a0
 [2319476.578530]  __alloc_skb+0x57/0x1c0
 [2319476.579233]  __tcp_send_ack.part.52+0x2f/0x100
 [2319476.579931]  tcp_rcv_established+0x4fe/0x5c0
 [2319476.580618]  tcp_v4_do_rcv+0x12a/0x1e0
 [2319476.581327]  tcp_v4_rcv+0xb43/0xc50
 [2319476.581992]  ? nft_do_chain_ipv4+0x66/0x80 [nf_tables]
 [2319476.582675]  ? endr_ioctl_enf+0xab1/0x1260 [gc_enforcement]
 [2319476.583330]  ip_protocol_deliver_rcu+0x2c/0x1d0
 [2319476.583975]  ip_local_deliver_finish+0x4d/0x60
 [2319476.584604]  ip_local_deliver+0xe0/0xf0
 [2319476.585219]  ? ip_protocol_deliver_rcu+0x1d0/0x1d0
 [2319476.585845]  ip_rcv+0x27b/0x36f
 [2319476.586446]  __netif_receive_skb_core+0xba0/0xcb0
 [2319476.587045]  ? inet_gro_receive+0x21f/0x2c0
 [2319476.587663]  netif_receive_skb_internal+0x3d/0xb0
 [2319476.588241]  napi_gro_receive+0x108/0x150
 [2319476.588811]  bnxt_rx_pkt+0xe88/0x12a0 [bnxt_en]
 [2319476.589392]  __bnxt_poll_work+0x179/0x2c0 [bnxt_en]
 [2319476.589977]  bnxt_poll+0xc2/0x1a0 [bnxt_en]
 [2319476.590521]  __napi_poll+0x2d/0x130
 [2319476.591053]  net_rx_action+0x253/0x320
 [2319476.591594]  __do_softirq+0xd7/0x2c4
 [2319476.592127]  irq_exit_rcu+0xcb/0xd0
 [2319476.592621]  irq_exit+0xa/0x10
 [2319476.593093]  do_IRQ+0x7f/0xd0
 [2319476.593552]  common_interrupt+0xf/0xf
 [2319476.594024]  </IRQ>
 [2319476.594471] RIP: 0010:mwait_idle+0x61/0x80
 [2319476.594896] Code: 48 8b 04 25 40 5c 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 75 17 e9 07 00 00 00 0f 00 2d ea d5 45 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 07 fb 66 0f 1f 44 00 00 65 48 8b 0>
 [2319476.595777] RSP: 0018:ffffb2c598acfea0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd9
 [2319476.596243] RAX: 0000000000000000 RBX: 0000000000000025 RCX: 0000000000000000
 [2319476.596693] RDX: 0000000000000000 RSI: 0000000000000025 RDI: ffff97ce7915d5c0
 [2319476.597131] RBP: 0000000000000025 R08: ffff97ce7915cf80 R09: 000000000002a680
 [2319476.597581] R10: 00197cc210ca11a6 R11: 0000000000000000 R12: 0000000000000000
 [2319476.598013] R13: 0000000000000000 R14: ffffffffffffffff R15: ffff96555a454040
 [2319476.598445]  default_idle_call+0x40/0xf0
 [2319476.598887]  do_idle+0x20f/0x2c0
 [2319476.599309]  cpu_startup_entry+0x6f/0x80
 [2319476.599749]  start_secondary+0x1a6/0x1e0
 [2319476.600170]  secondary_startup_64_no_verify+0xc2/0xcb
 [2319476.600596] Modules linked in: traps(OE) sisap(POE) sisevt(PE) binfmt_misc nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfsv3 nfs_acl nfs lockd grace fscache>
 [2319476.600666]  i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse [last unloaded: traps]
  • Most of the load is coming from runnable tasks.
  crash> ps -S
   RU: 103
   IN: 1777
   ID: 485
   ZO: 1

 485  idle
1 zombie
1777 interruptible.
103 running

    So there aren't any stuck processes except for the zombie process.

Step: 2

  • The panic occurred in the CPU 37 IRQ stack.
crash> set -p
    PID: 0
COMMAND: "swapper/37"
   TASK: ffff96555a454040  (1 of 80)  [THREAD_INFO: ffff96555a454040]
    CPU: 37
  STATE: TASK_RUNNING (PANIC)
  • Backtrace of the panic task.
The first backtrace lands on this exception
-->     [exception RIP: get_partial_node+0x86]

crash> bt
PID: 0 TASK: ffff96555a454040  CPU: 37   COMMAND: "swapper/37"
 #0 [ffffb2c5993ac668] machine_kexec at ffffffffb0a650ce
#1 [ffffb2c5993ac6c0] __crash_kexec at ffffffffb0ba53dd
#2 [ffffb2c5993ac788] crash_kexec at ffffffffb0ba62cd
#3 [ffffb2c5993ac7a0] oops_end at ffffffffb0a264cd
#4 [ffffb2c5993ac7c0] general_protection at ffffffffb14010ee
    [exception RIP: get_partial_node+134]  <<===
    RIP: ffffffffb0d0aaf6  RSP: ffffb2c5993ac870  RFLAGS: 00010002
    RAX: 2255532048424102  RBX: ffff96554125b6c0  RCX: 2255532048424103
    RDX: 0000000000000001  RSI: 0000000000980000  RDI: ffff96554125b6c0
    RBP: ffffb2c5993ac920   R8: 22555320484240fb   R9: 0000000000000000
    R10: ffff965560a84840  R11: fffff5b3f906d308  R12: fffff5b3f906d300
    R13: ffff965560a84840  R14: 0000000000000002  R15: 22555320484240fb
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffb2c5993ac928] ___slab_alloc at ffffffffb0d0b1e8
 #6 [ffffb2c5993ac9d8] __alloc_skb at ffffffffb11c3847   <<----
 #7 [ffffb2c5993aca40] __alloc_skb at ffffffffb11c3847
 #8 [ffffb2c5993aca88] __tcp_send_ack at ffffffffb1299d2f
 #9 [ffffb2c5993acaa0] tcp_rcv_established at ffffffffb1294cee
#10 [ffffb2c5993acae0] tcp_v4_do_rcv at ffffffffb12a13ba
#11 [ffffb2c5993acb00] tcp_v4_rcv at ffffffffb12a3723
#12 [ffffb2c5993acb70] ip_protocol_deliver_rcu at ffffffffb1276c3c
#13 [ffffb2c5993acb90] ip_local_deliver_finish at ffffffffb1276e2d
#14 [ffffb2c5993acb98] ip_local_deliver at ffffffffb1276f20
#15 [ffffb2c5993acbe8] ip_rcv at ffffffffb12771ab
#16 [ffffb2c5993acc48] __netif_receive_skb_core at ffffffffb11e0610
#17 [ffffb2c5993acce0] netif_receive_skb_internal at ffffffffb11e07bd
#18 [ffffb2c5993acd08] napi_gro_receive at ffffffffb11e1268
#19 [ffffb2c5993acd28] bnxt_rx_pkt at ffffffffc0229918 [bnxt_en]
#20 [ffffb2c5993ace08] __bnxt_poll_work at ffffffffc022a799 [bnxt_en]
#21 [ffffb2c5993ace50] bnxt_poll at ffffffffc022a9a2 [bnxt_en]
#22 [ffffb2c5993acea8] __napi_poll at ffffffffb11e1b5d
#23 [ffffb2c5993aced8] net_rx_action at ffffffffb11e2033
#24 [ffffb2c5993acf58] __softirqentry_text_start at ffffffffb16000d7
#25 [ffffb2c5993acfa8] irq_exit_rcu at ffffffffb0af3f3b
#26 [ffffb2c5993acfb8] irq_exit at ffffffffb0af3f4a
#27 [ffffb2c5993acfc0] do_IRQ at ffffffffb1401e7f
--- <IRQ stack> ---
#28 [ffffb2c598acfdf8] ret_from_intr at ffffffffb1400a8f
    [exception RIP: mwait_idle+97]
    RIP: ffffffffb13a5d61  RSP: ffffb2c598acfea0  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 0000000000000025  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000025  RDI: ffff97ce7915d5c0
    RBP: 0000000000000025   R8: ffff97ce7915cf80   R9: 000000000002a680
    R10: 00197cc210ca11a6  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: ffffffffffffffff  R15: ffff96555a454040
    ORIG_RAX: ffffffffffffffd9  CS: 0010  SS: 0018
#29 [ffffb2c598acfea0] default_idle_call at ffffffffb13a5f90
#30 [ffffb2c598acfec0] do_idle at ffffffffb0b2583f
#31 [ffffb2c598acff10] cpu_startup_entry at ffffffffb0b25abf
#32 [ffffb2c598acff30] start_secondary at ffffffffb0a59a46
#33 [ffffb2c598acff50] secondary_startup_64_no_verify at ffffffffb0a00107
  • We crashed due to dereferencing %rax which contains gibberish.
 crash> dis get_partial_node+0x86 -r 3
 0xffffffffb0d0aaef <get_partial_node+0x7f>:     and    $0x1,%edx
 0xffffffffb0d0aaf2 <get_partial_node+0x82>:     cmove  %r12,%rax
 0xffffffffb0d0aaf6 <get_partial_node+0x86>:     mov    (%rax),%rax

 RAX: 2255532048424102

Step:3

  • In the source, we panicked while iterating over a list.
 2044 static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
 2045   struct page **ret_page, gfp_t gfpflags)
 2046 {
 2047  struct page *page, *page2;
 2048  void *object = NULL;
 2049  unsigned int available = 0;
 2050  unsigned long flags;
 2051  int objects;
 2052 
 2053  /*
 2054   * Racy check. If we mistakenly see no partial slabs then we
 2055   * just allocate an empty slab. If we mistakenly try to get a
 2056   * partial slab and there is none available then get_partials()
 2057   * will return NULL.
 2058   */
 2059  if (!n || !n->nr_partial)
 2060   return NULL;
 2061 
 2062  spin_lock_irqsave(&n->list_lock, flags);
 2063  list_for_each_entry_safe(page, page2, &n->partial, slab_list) {    > - - - -
                                                                                   `- - - >/* panic'd here

  • The list head is derived from n->partial (2nd arg to get_partial_node). We'll need to find this argument.
 crash> whatis get_partial_node
 void *get_partial_node(struct kmem_cache *, struct kmem_cache_node *, struct page **, gfp_t);
%rdi    %rsi  %rcx %rdx

 crash> dis -r get_partial_node+0x86
 0xffffffffb0d0aa70 <get_partial_node>:  nopl   0x0(%rax,%rax,1) [FTRACE NOP]
 0xffffffffb0d0aa75 <get_partial_node+0x5>:      push   %rbp
 0xffffffffb0d0aa76 <get_partial_node+0x6>:      mov    %rsp,%rbp
 0xffffffffb0d0aa79 <get_partial_node+0x9>:      push   %r15
 0xffffffffb0d0aa7b <get_partial_node+0xb>:      push   %r14
 0xffffffffb0d0aa7d <get_partial_node+0xd>:      push   %r13
 0xffffffffb0d0aa7f <get_partial_node+0xf>:      push   %r12
 0xffffffffb0d0aa81 <get_partial_node+0x11>:     push   %rbx
 0xffffffffb0d0aa82 <get_partial_node+0x12>:     mov    %rsi,%rbx    /* %rsi -> %rbx
 0xffffffffb0d0aa85 <get_partial_node+0x15>:     and    $0xfffffffffffffff0,%rsp
 0xffffffffb0d0aa89 <get_partial_node+0x19>:     add    $0xffffffffffffff80,%rsp
 0xffffffffb0d0aa8d <get_partial_node+0x1d>:     mov    %rdi,0x28(%rsp)
 0xffffffffb0d0aa92 <get_partial_node+0x22>:     mov    %rsi,%rdi    /* %rsi -> %rdi
 0xffffffffb0d0aa95 <get_partial_node+0x25>:     mov    %ecx,0x30(%rsp)
 0xffffffffb0d0aa99 <get_partial_node+0x29>:     mov    %rsi,0x10(%rsp)
 0xffffffffb0d0aa9e <get_partial_node+0x2e>:     mov    %rdx,0x8(%rsp)
 0xffffffffb0d0aaa3 <get_partial_node+0x33>:     call   0xffffffffb13a62b0 <_raw_spin_lock_irqsave>
 0xffffffffb0d0aaa8 <get_partial_node+0x38>:     mov    0x10(%rbx),%r11
 0xffffffffb0d0aaac <get_partial_node+0x3c>:     mov    %rax,(%rsp)
 0xffffffffb0d0aab0 <get_partial_node+0x40>:     mov    %rbx,%rax
 0xffffffffb0d0aab3 <get_partial_node+0x43>:     add    $0x10,%rax
 0xffffffffb0d0aab7 <get_partial_node+0x47>:     mov    (%r11),%rcx
 0xffffffffb0d0aaba <get_partial_node+0x4a>:     mov    %rax,0x38(%rsp)
 0xffffffffb0d0aabf <get_partial_node+0x4f>:     cmp    %r11,%rax
 0xffffffffb0d0aac2 <get_partial_node+0x52>:     je     0xffffffffb0d0ace6 <get_partial_node+0x276>
 0xffffffffb0d0aac8 <get_partial_node+0x58>:     xor    %r15d,%r15d
 0xffffffffb0d0aacb <get_partial_node+0x5b>:     lea    -0x8(%rcx),%r8
 0xffffffffb0d0aacf <get_partial_node+0x5f>:     mov    0x28(%rsp),%r10
 0xffffffffb0d0aad4 <get_partial_node+0x64>:     lea    -0x8(%r11),%r12
 0xffffffffb0d0aad8 <get_partial_node+0x68>:     movl   $0x0,0x34(%rsp)
 0xffffffffb0d0aae0 <get_partial_node+0x70>:     mov    %r15,%r9
 0xffffffffb0d0aae3 <get_partial_node+0x73>:     mov    %r8,%r15
 0xffffffffb0d0aae6 <get_partial_node+0x76>:     mov    0x8(%r12),%rdx
 0xffffffffb0d0aaeb <get_partial_node+0x7b>:     lea    -0x1(%rdx),%rax
 0xffffffffb0d0aaef <get_partial_node+0x7f>:     and    $0x1,%edx
 0xffffffffb0d0aaf2 <get_partial_node+0x82>:     cmove  %r12,%rax
 0xffffffffb0d0aaf6 <get_partial_node+0x86>:     mov    (%rax),%rax
  • These two registers haven't been overwritten and contain the kmem_cache_node.
 RBX: ffff96554125b6c0
 RDI: ffff96554125b6c0
  • Here's struct kmem_cache_node.
 crash> struct kmem_cache_node ffff96554125b6c0
 struct kmem_cache_node {
   list_lock = {
     {
rlock = {
  raw_lock = {
    {
      val = {
 counter = 0x580001
      },
      {
 locked = 0x1,
 pending = 0x0
      },
      {
 locked_pending = 0x1,
 tail = 0x58
      }
    }
  }
}
     }
   },
   nr_partial = 0x8,
   partial = {
     next = 0xfffff5b3f906d308,
     prev = 0xfffff5b63dfb1408
   },
   nr_slabs = {
     counter = 0xd
   },
   total_objects = {
     counter = 0x222
   },
   full = {
     next = 0xffff96554125b6f0,
     prev = 0xffff96554125b6f0
   }
 }

Step:4

  • We are iterating over the embedded list (list head n above). Each entry is a struct page.slab_list.
 2063  list_for_each_entry_safe(page, page2, &n->partial, slab_list) {

 crash> struct kmem_cache_node ffff96554125b6c0 -o
 struct kmem_cache_node {
   [ffff96554125b6c0] spinlock_t list_lock;
   [ffff96554125b6c8] unsigned long nr_partial;
   [ffff96554125b6d0] struct list_head partial;
   [ffff96554125b6e0] atomic_long_t nr_slabs;
   [ffff96554125b6e8] atomic_long_t total_objects;
   [ffff96554125b6f0] struct list_head full;
 }
 SIZE: 0x40
  • We can see the next entry is bad.
 crash> list -H ffff96554125b6d0
 fffff5b3f906d308
 2255532048424103
 list: invalid kernel virtual address: 2255532048424103  type: "list entry"
  • In the reverse order same thing.
 crash> list -H ffff96554125b6d0 -r
 fffff5b63dfb1408
 fffff5b37a877608
 fffff5b1a860eb08
 fffff5b68a86dd08
 fffff5b14ac48008
 fffff5b144f3eb08
 fffff5b49a42e008
 fffff5b3f906d308
 61e5583ce4242c38
 list: invalid kernel virtual address: 61e5583ce4242c40  type: "list entry"
  • This list is embedded in a struct page at the page.slab_list (0x8). The problem is entry fffff5b3f906d308 (struct page fffff5b3f906d300) as next (forward) and prev (reverse) are both bad. kmem also corroborates this.
 crash> kmem -s >/dev/null
 kmem: skbuff_head_cache: partial list slab: fffff5b3f906d300 invalid page.lru.next: 2255532048424103
  • The page struct is toast.
 crash> struct page fffff5b3f906d300
 struct page {
   flags = 0x87701336b5200d85,
 - - - - - - 8< - - - - - - -  
     {
{
  slab_list = {
    next = 0x2255532048424103,
    prev = 0x61e5583ce4242c38
  },
  {
    next = 0x2255532048424103,
    pages = 0xe4242c38,
    pobjects = 0x61e5583c
  }
},
slab_cache = 0x84097c1795955408,
freelist = 0x5547494e075fb057,
{
  s_mem = 0x6f22551352414b44,
  counters = 0x6f22551352414b44,
  {
    inuse = 0x4b44,
    objects = 0x5241,
    frozen = 0x0
  }
}
     },


 crash> struct page | grep SIZE
 SIZE: 0x40

 crash> p (0xfffff5b3f906d300 - 0x40)
 $6 = 0xfffff5b3f906d2c0
  • The page struct before and after look ok.
 crash> rd 0xfffff5b3f906d2c0 24
 fffff5b3f906d2c0:  0017ffffc0000000 dead000000000100   ................
 fffff5b3f906d2d0:  dead000000000200 0000000000000000   ................
 fffff5b3f906d2e0:  0000000000000000 0000000000000000   ................
 fffff5b3f906d2f0:  00000001fffffdff ffff97a5e4caa00a   ................

 fffff5b3f906d300:  87701336b5200d85 2255532048424103   .. .6.p..ABH SU"
 fffff5b3f906d310:  61e5583ce4242c38 84097c1795955408   8,$.<X.a.T...|..
 fffff5b3f906d320:  5547494e075fb057 6f22551352414b44   W._.NIGUDKAR.U"o
 fffff5b3f906d330:  22536e8d20414e5d b022fefc165445fa   ]NA .nS".ET...".

 fffff5b3f906d340:  0017ffffc0000000 fffff5b3f906d301   ................
 fffff5b3f906d350:  fffffffff9060201 dead000000000400   ................
 fffff5b3f906d360:  0000000000000000 0000000000000000   ................
 fffff5b3f906d370:  00000000ffffffff 0000000000000000   ................

Step: 5

  • Searching for the page struct and we see 7 active tasks have it on their stacks. Quite a few inactive tasks as well.
 crash> search -T fffff5b3f906d300 | grep -c COMM
 7

 crash> search -t fffff5b3f906d300 | grep -c COMM
 169
  • Looking at the active CPUs there are at least 20 in spin_lock called from slab free/alloc. All are networking-related stacks and
    many are calledskb_clone, alloc_skb, etc. Here's an example with the page on its stack:
 crash> bt ffff9655bcd90040
 PID: 3176112  TASK: ffff9655bcd90040  CPU: 20   COMMAND: "vertica"
  #0 [fffffe0000405e48] crash_nmi_callback at ffffffffb0a57853
  #1 [fffffe0000405e50] nmi_handle at ffffffffb0a26e03
  #2 [fffffe0000405ea8] default_do_nmi at ffffffffb1393619
  #3 [fffffe0000405ec8] do_nmi at ffffffffb0a2733f
  #4 [fffffe0000405ef0] end_repeat_nmi at ffffffffb14015a4
     [exception RIP: native_queued_spin_lock_slowpath+0x13c]
     RIP: ffffffffb0b4e54c  RSP: ffffb2c59afc7ae8  RFLAGS: 00000046
     RAX: 0000000000000000  RBX: 0000000000000246  RCX: 0000000000000029
     RDX: ffff97ce78d2bb80  RSI: 0000000000540000  RDI: ffff96554125b6c0
     RBP: ffffb2c59afc7be8   R8: 0000000000000001   R9: ffffffffb1289585
     R10: ffff972ad0b81840  R11: 0000000000000001  R12: ffff972ad0b81840
     R13: ffff96554125b6c0  R14: ffff972ad0b81840  R15: ffff965560a84840
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 --- <NMI exception stack> ---
  #5 [ffffb2c59afc7ae8] native_queued_spin_lock_slowpath at ffffffffb0b4e54c
  #6 [ffffb2c59afc7ae8] _raw_spin_lock_irqsave at ffffffffb13a62e0
  #7 [ffffb2c59afc7af8] free_debug_processing at ffffffffb0d0975f
  #8 [ffffb2c59afc7b48] __slab_free at ffffffffb0d09b43   <<==
  #9 [ffffb2c59afc7bd8] tcp_recvmsg at ffffffffb1289585
 #10 [ffffb2c59afc7ce8] inet_recvmsg at ffffffffb12bd5db
 #11 [ffffb2c59afc7d08] sock_read_iter at ffffffffb11b79b2
 #12 [ffffb2c59afc7d78] do_iter_readv_writev at ffffffffb0d3d468
 #13 [ffffb2c59afc7dc0] do_iter_read at ffffffffb0d3e26f
 #14 [ffffb2c59afc7df8] vfs_readv at ffffffffb0d40a5d
 #15 [ffffb2c59afc7ef8] do_readv at ffffffffb0d40afc
 #16 [ffffb2c59afc7f38] do_syscall_64 at ffffffffb0a0430b
 #17 [ffffb2c59afc7f50] entry_SYSCALL_64_after_hwframe at ffffffffb14000ad
     RIP: 00007f1e7690565f  RSP: 00007f15417f8040  RFLAGS: 00000293
     RAX: ffffffffffffffda  RBX: 000000000000007c  RCX: 00007f1e7690565f
     RDX: 0000000000000003  RSI: 00007ef23c5719f0  RDI: 000000000000007c
     RBP: 00007ef23c5719f0   R8: 0000000000000000   R9: 0000000011ac899e
     R10: 00007ef23c571b48  R11: 0000000000000293  R12: 0000000000000003
     R13: 0000000000000003  R14: 00007ef23c5719f0  R15: 000000000000000b
     ORIG_RAX: 0000000000000013  CS: 0033  SS: 002b

  • I searched instead for one of the bad values that overwrote the struct page.

  • There are only four results. The last two are the page struct itself and an alias.

 crash> search  87701336b5200d85
 ffff97ce7980bd70: 87701336b5200d85 \ addresses alias to CPU 64 NMI stack
 ffff97ce7980bf48: 87701336b5200d85 / 
 ffff97d13186d300: 87701336b5200d85  alias to struct page   
 fffff5b3f906d300: 87701336b5200d85  struct page
  • The first two are aliases for the CPU 64 NMI stack.

  • The search result.

 crash> vtop ffff97ce7980bd70 | head -2
 VIRTUAL          PHYSICAL 
 fff97ce7980bd70  17a3980bd70  
                      v
                      '- - - ->/ note the physical address

  • Here are CPU 64 stacks from help.
 crash> help -m | awk '/ebase/,/ibase/' | grep '\['64 -A1
   [64]: fffffe0000cc7000 fffffe0000cc9000 fffffe0000ccb000 fffffe0000ccd000 
         fffffe0000ccf000                  0000000000000000 
                                v
                                '- - - - >/* NMI stack for CPU 64 (0x1000 bytes)


  • The NMI base plus 0xd70 (at least three hex values of the search value) have the same physical address.
 crash> vtop fffffe0000cc9d70 | head -2
 VIRTUAL          PHYSICAL 
 ffffe0000cc9d70  17a3980bd70  
  • But shows a few of the bad values were saved to the stack by the NMI. They were in various registers (%rax, %rcx, etc). This is the only
    task that isn't on a spinlock from slab_[alloc/free]. Either this process is hitting the same corruption or it may be the cause (race
    between alloc/free).
 crash> bt -c64
 PID: 3831542  TASK: ffff96b093fe4040  CPU: 64   COMMAND: "vertica"
  #0 [fffffe0000cc9e48] crash_nmi_callback at ffffffffb0a57853
  #1 [fffffe0000cc9e50] nmi_handle at ffffffffb0a26e03
  #2 [fffffe0000cc9ea8] default_do_nmi at ffffffffb1393619
  #3 [fffffe0000cc9ec8] do_nmi at ffffffffb0a2733f
  #4 [fffffe0000cc9ef0] end_repeat_nmi at ffffffffb14015a4
     [exception RIP: __slab_free+0x209]
     RIP: ffffffffb0d09b59  RSP: ffffb2c59c1376c0  RFLAGS: 00000002
     RAX: 87701336b5200d85  RBX: ffff970281b4c340  RCX: 6f22551352414b43
     RDX: 6f22551352414b44  RSI: 0000000052414b43  RDI: 0000000000000246
     RBP: ffffb2c59c137758   R8: 0000000000000000   R9: ffff970281b4c400
     R10: ffff970281b4c340  R11: 0000000000000001  R12: fffff5b3f906d300
     R13: 5547494e075fb057  R14: ffff965560a84840  R15: 0000000000000000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 --- <NMI exception stack> ---
  #5 [ffffb2c59c1376c0] __slab_free at ffffffffb0d09b59
  #6 [ffffb2c59c137748] tpacket_rcv at ffffffffb1356600  /* small note, the stacks are not accurate 
  #7 [ffffb2c59c1377e8] kmem_cache_alloc at ffffffffb0d0c52f    /* kmem_cache_alloc does not call tpacket_rcv....
  #8 [ffffb2c59c137848] dev_queue_xmit_nit at ffffffffb11da106
  #9 [ffffb2c59c137888] dev_hard_start_xmit at ffffffffb11de6ea
 #10 [ffffb2c59c1378d8] __dev_queue_xmit at ffffffffb11df26b
 #11 [ffffb2c59c137958] ip_finish_output2 at ffffffffb127ae29
 #12 [ffffb2c59c1379b0] ip_output at ffffffffb127c7f0
 #13 [ffffb2c59c137a08] __ip_queue_xmit at ffffffffb127c1fd
 #14 [ffffb2c59c137a60] __tcp_transmit_skb at ffffffffb1298dbd
 #15 [ffffb2c59c137b70] tcp_write_xmit at ffffffffb129a2ad
 #16 [ffffb2c59c137bd8] __tcp_push_pending_frames at ffffffffb129b162
 #17 [ffffb2c59c137bf0] tcp_sendmsg_locked at ffffffffb1286eea
 #18 [ffffb2c59c137c98] tcp_sendmsg at ffffffffb1287057
 #19 [ffffb2c59c137cb8] sock_sendmsg at ffffffffb11b752e
 #20 [ffffb2c59c137cd0] sock_write_iter at ffffffffb11b75d7
 #21 [ffffb2c59c137d40] do_iter_readv_writev at ffffffffb0d3d436
 #22 [ffffb2c59c137d88] do_iter_write at ffffffffb0d3e7d0
 #23 [ffffb2c59c137db8] vfs_writev at ffffffffb0d3e9a3
 #24 [ffffb2c59c137eb0] do_writev at ffffffffb0d3ea5c
 #25 [ffffb2c59c137ef0] sisevt_hook_sys_writev at ffffffffc0670779 [sisevt]
 #26 [ffffb2c59c137f30] sisevt64_sys_writev at ffffffffc0675d8b [sisevt]
 #27 [ffffb2c59c137f38] do_syscall_64 at ffffffffb0a0430b
 #28 [ffffb2c59c137f50] entry_SYSCALL_64_after_hwframe at ffffffffb14000ad
     RIP: 00007f1e7690570f  RSP: 00007e9e3faeabd0  RFLAGS: 00000293
     RAX: ffffffffffffffda  RBX: 0000000000000083  RCX: 00007f1e7690570f
     RDX: 0000000000000002  RSI: 00007ee148051b50  RDI: 0000000000000083
     RBP: 00007ee148051b50   R8: 0000000000000000   R9: 00007ee148051b50
     R10: 0000000000000083  R11: 0000000000000293  R12: 0000000000000002
     R13: 00007ef23c344240  R14: 00007e9e3faeb140  R15: 00007ef23c344240
     ORIG_RAX: 0000000000000014  CS: 0033  SS: 002b

Step: 6

  • The server has only a bond with an active IP address.
net
   NET_DEVICE     NAME   IP ADDRESS(ES)
ffff96562047c000  lo     127.0.0.1
ffff96568247c000  ens1f0np0 
ffff9657a1164000  ens1f1np1 
ffff965741db0000  ens4f0np0 
ffff965800170000  ens4f1np1 
ffff9657a08a4000  eno5   
ffff9657416f0000  eno6   
ffff9656e2004000  eno7   
ffff96568425c000  eno8   
ffff96556c50a000  bond0  10.52.25.77
  • If we investigate the PID mentioned in the above analysis 3176112.
net -s 3176112
PID: 3176112  TASK: ffff9655bcd90040  CPU: 20   COMMAND: "vertica"
FD      SOCKET     SOCKFAMILY:TYPE SOURCE-PORT DESTINATION-PORT
 3 ffff97150e10b100 ffff96bf302d3240 INET:STREAM  0.0.0.0-6440 0.0.0.0-0
 5 ffff973a4d0be500 ffff97cd2cba9940 UNIX:STREAM 
 9 ffff965e60e0de80 ffff972295605540 UNIX:STREAM 
10 ffff97150e1093c0 ffff96bf302d6440 INET:STREAM  0.0.0.0-6439 0.0.0.0-0
11 ffff97150e10f200 ffff96882c8ccb80 INET6:STREAM 
16 ffff97150e10c7c0 ffff977a4ff42840 INET:STREAM  10.52.25.77-39620 10.50.72.109-9092
20 ffff97150e10a740 ffff9655c0f3a6c0 INET:DGRAM   10.52.25.77-6439 0.0.0.0-0
23 ffff965563e52400 ffff977a4ff43240 INET:STREAM  10.52.25.77-39624 10.50.72.109-9092
35 ffff965563e51a40 ffff977a4ff46440 INET:STREAM  10.52.25.77-39626 10.50.72.109-9092
  --- trimmed---
2358 ffff970cc0759700 ffff96571083c640 INET:STREAM  10.52.25.77-6439 10.52.107.30-55018
3712 ffff970cc075eec0 ffff96571083a840 INET:STREAM  10.52.25.77-6439 10.50.101.139-47508
  • It seems that there are a lot of connections to 10.50.72.109 from bond0.

  • The socket appears to be unconnected, perhaps at the time of the vmcore, the sockets got disconnected.

net -S 3176112
PID: 3176112  TASK: ffff9655bcd90040  CPU: 20   COMMAND: "vertica"   <<===
FDSOCKET      SOCK      
 3  ffff97150e10b100  ffff96bf302d3240

struct socket {
  state = SS_UNCONNECTED,
  type = 0x1,
  flags = 0x0,
  wq = 0xffff96abcd7f7540,
  file = 0xffff96fbf2a50ac0,
  sk = 0xffff96bf302d3240,
  ops = 0xffffffffb18fd3a0 <inet_stream_ops>
}

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments