Dynatrace crashing pods OpenShift Nodes

Solution Verified - Updated -

Environment

  • OpenShift Enterprise Container Platform 3
  • Dynatrace OneAgent Version 1.147 or below

Issue

  • Pods crash from segfault on nodes monitored by Dynatrace agent.

  • Header of a crash file created by Dynatrace:

$ head -1 AGENT/d90a1a84e861fdb6/crashalerts/20180809113402_82863/httpd_82863.log
dumpproc version 1.133.135.20171220-181334, installer version 1.133.147.20180110-151508

Resolution

Open an issue with Dynatrace.
Upgrade Dynatrace OneAgent v1.149 or greater.

Root Cause

Httpd appears to crash due to 3rd party dynatrace liboneagentapache.so module:

# gdb /opt/rh/httpd24/root/usr/sbin/httpd core.httpd.1002370000.59d495620aab4d409d71367d9d3a094e.72083.1534284044000000
...
Core was generated by `httpd -D FOREGROUND'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f3a69043787 in malloc_consolidate (av=av@entry=0x7f3a69382760 ) at malloc.c:4162
4162                  unlink(av, nextchunk, bck, fwd);
(gdb) bt
#0  0x00007f3a69043787 in malloc_consolidate (av=av@entry=0x7f3a69382760 ) at malloc.c:4162
#1  0x00007f3a690443e6 in _int_free (av=0x7f3a69382760 , p=0x55aa1b0836a0, have_lock=0) at malloc.c:4054
#2  0x00007f3a593a08d9 in ?? () from /opt/dynatrace/oneagent/agent/lib64/liboneagentapache.so
#3  0x00007f3a593be8e4 in ?? () from /opt/dynatrace/oneagent/agent/lib64/liboneagentapache.so
#4  0x00007f3a59366986 in ?? () from /opt/dynatrace/oneagent/agent/lib64/liboneagentapache.so
#5  0x00007f3a59366c8d in ?? () from /opt/dynatrace/oneagent/agent/lib64/liboneagentapache.so
#6  0x00007f3a5932e1d7 in ?? () from /opt/dynatrace/oneagent/agent/lib64/liboneagentapache.so
#7  0x00007f3a5931b704 in ?? () from /opt/dynatrace/oneagent/agent/lib64/liboneagentapache.so
#8  0x00007f3a5931b796 in ?? () from /opt/dynatrace/oneagent/agent/lib64/liboneagentapache.so
#9  0x00007f3a5931c956 in ?? () from /opt/dynatrace/oneagent/agent/lib64/liboneagentapache.so
#10 0x00007f3a697c21ae in run_cleanups (cref=) at memory/unix/apr_pools.c:2352
#11 apr_pool_destroy (pool=0x55aa1af167f8) at memory/unix/apr_pools.c:814
#12 0x00007f3a5fc9d4fc in clean_child_exit (code=code@entry=0) at prefork.c:230
#13 0x00007f3a5fc9da2e in child_main (child_num_arg=child_num_arg@entry=543, child_bucket=child_bucket@entry=0) at prefork.c:747
#14 0x00007f3a5fc9de01 in make_child (s=0x55aa1a75df00, slot=543, bucket=0) at prefork.c:834
#15 0x00007f3a5fc9ecc4 in perform_idle_server_maintenance (p=) at prefork.c:942
#16 prefork_run (_pconf=, plog=, s=) at prefork.c:1138
#17 0x000055aa18e2a08e in ap_run_mpm (pconf=0x55aa1a732138, plog=0x55aa1a75fe58, s=0x55aa1a75df00) at mpm_common.c:94
#18 0x000055aa18e233c6 in main (argc=3, argv=0x7fff3dafc9c8) at main.c:777
(gdb)

Without debuginfo for liboneagentapache.so, we cannot tell more.

Diagnostic Steps

Remove dynatrace from the nodes where the crash is occurring and continue to monitor the nodes without it to see if the issue continues.

Alternatively, for further evidence, obtain a core dump of the application pod when it crashes. Dynatrace installation may have overwritten the default /proc/sys/kernel/core_pattern and may require changing to another core_pattern to obtain core dumps successfully 1.

Analysis of the application core dump points back to /opt/dynatrace/oneagent/agent/lib64/liboneagentapache.so.

Contact Dyntrace and report the problem with the backtrace found during analysis of the core dump.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments