How to enable valgrind for Red Hat Directory Server version 10?

Solution Verified - Updated -

Environment

Red Hat Directory Server version 10
Red Hat Enterprise Linux version 7

Issue

Show by example how to run the tool called valgrind to detect memory leaks in Red Hat Directory Sever application version 10 on Red Hat Enterprise Linux version 7.

Resolution

IMPORTANT WARNING:
Runing the LDAP service under valgrind has a severe drawback, depending on the hardware used and LDAP traffic or verbose logging set in ns-slapd, the performance will likely be very bad, the LDAP service may become extremely slow to respond, or appear to not respond at all, be aware.
Adding RAM may help a little bit.
If it is acceptable to install the valgrind tool, and run ns-slapd from valgrind knowing the possible effects, then continue with this article for more investigation to locate eventual ns-slapd memory leaks:

Install the 389-ds-base debug info and glibc packages on a test system:

debuginfo-install 389-ds-base glibc
yum install -y valgrind

Stop the LDAP service, and verify it is stopped:

systemctl stop dirsrv.target
lsof -i :389

Keep a copy of the systemd dirsrv target for an installed instance, example with m1, replace the string "m1" with what is used in the system's environment:

cp -p /etc/systemd/system/dirsrv.target.wants/dirsrv\@m1.service ~/etc.systemd.system.dirsrv.target.wants.dirsrv\@m1.service.orig

We want to run ns-slapd under valgrind with those options:

ExecStart=/bin/valgrind -v --tool=memcheck --leak-check=full --leak-resolution=high --num-callers=50 --trace-children=yes --show-reachable=yes --track-origins=yes --read-var-info=yes --log-file=/tmp/valgrind.%p.out /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-%i -i /var/run/dirsrv/slapd-%i.pid

Modify the systemd dirsrv target for an installed instance, example with m1, replace the string "m1" with what is used in the system's environment:

sed -i 's/^\(ExecStart=.*$\)/#\1\nExecStart=\/usr\/bin\/valgrind -v --tool=memcheck --leak-check=full --leak-resolution=high --num-callers=50 --trace-children=yes --show-reachable=yes --track-origins=yes --read-var-info=yes --log-file=\/var\/tmp\/valgrind.%p.out \/usr\/sbin\/ns-slapd -D \/etc\/dirsrv\/slapd-%i -i \/var\/run\/dirsrv\/slapd-%i.pid/' /etc/systemd/system/dirsrv.target.wants/dirsrv\@m1.service

Restart systemd:

systemctl daemon-reload

Optionnal, show system messages:

tail -f /var/log/messages &

Restart the Red Hat Directory Server:

systemctl restart dirsrv.target

Provide the valgrind output file for review when there are signs of memory leak, excessive size, or a out os memory situation.

Root Cause

Need to investigate LDAP service's unusual process size in memory, here with the ns-slapd binary provided by the Red Hat Directory Server application.
This applies when for example, the ns-slapd process is selected by the kernel as a Out Of Memory / OOM candidate, and terminates this application.
The valgrind tool can show eventual memory leaks and detect portions of source code provoking them.

Diagnostic Steps

Output example when modifying the systemd dirsrv target for an installed instance, example with m1, replace the string "m1" with what is used in the system's environment:

diff -u ~/etc.systemd.system.dirsrv.target.wants.dirsrv\@m1.service.orig /etc/systemd/system/dirsrv.target.wants/dirsrv\@m1.service
--- /root/etc.systemd.system.dirsrv.target.wants.dirsrv@m1.service.orig 2018-09-11 16:12:12.588000000 +0000
+++ /etc/systemd/system/dirsrv.target.wants/dirsrv@m1.service   2018-09-11 16:14:13.323000000 +0000
@@ -26,7 +26,8 @@
 EnvironmentFile=/etc/sysconfig/dirsrv-%i
 PIDFile=/var/run/dirsrv/slapd-%i.pid
 ExecStartPre=/usr/sbin/ds_systemd_ask_password_acl /etc/dirsrv/slapd-%i/dse.ldif
-ExecStart=/usr/sbin/ns-slapd -D /etc/dirsrv/slapd-%i -i /var/run/dirsrv/slapd-%i.pid
+#ExecStart=/usr/sbin/ns-slapd -D /etc/dirsrv/slapd-%i -i /var/run/dirsrv/slapd-%i.pid
+ExecStart=/usr/bin/valgrind -v --tool=memcheck --leak-check=full --leak-resolution=high --num-callers=50 --trace-children=yes --show-reachable=yes --track-origins=yes --read-var-info=yes --log-file=/var/tmp/valgrind.%p.out /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-%i -i /var/run/dirsrv/slapd-%i.pid
 # if you need to set other directives e.g. LimitNOFILE=8192
 # set them in this file
 .include /etc/sysconfig/dirsrv.systemd

Output example with system messages when starting the LDAP service, note valgrind has ns-slaps under its control:

Sep 11 16:15:30 m1 systemd: Starting 389 Directory Server m1....
Sep 11 16:15:41 m1 valgrind: [11/Sep/2018:16:15:41.732954213 +0000] - NOTICE - slapd_bootstrap_config - nsslapd-errorlog-level: ignoring 16384 (since -d 266354688 was given on the command line)
...
Sep 11 16:16:18 m1 valgrind: [11/Sep/2018:16:16:18.164395617 +0000] - WARN - Security Initialization - SSL alert: Sending pin request to SVRCore. You may need to run systemd-tty-ask-password-agent to provide the password.
Sep 11 16:16:18 m1 valgrind: [11/Sep/2018:16:16:18.591547707 +0000] - INFO - Security Initialization - SSL info: Configured NSS Ciphers
...
Sep 11 16:16:41 m1 valgrind: [11/Sep/2018:16:16:41.272660335 +0000] - INFO - slapd_daemon - slapd started.  Listening on All Interfaces port 389 for LDAP requests
Sep 11 16:16:41 m1 valgrind: [11/Sep/2018:16:16:41.281664936 +0000] - INFO - slapd_daemon - Listening on All Interfaces port 636 for LDAPS requests
Sep 11 16:16:41 m1 systemd: Started 389 Directory Server m1..
Sep 11 16:16:41 m1 systemd: Reached target 389 Directory Server.
Sep 11 16:16:41 m1 systemd: Starting 389 Directory Server.

Optionnal, verify there is a TCP listener for the LDAP service:

lsof -i :389
COMMAND    PID      USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
memcheck- 6505 ldapuser1    8u  IPv6  46684      0t0  TCP *:ldap (LISTEN)

Verify there is a valgrind output:

ls -lh /var/tmp/valg*
-rw-r--r--. 1 root root 60K Sep 11 16:16 /var/tmp/valgrind.dirsrv.out

less /var/tmp/valgrind.dirsrv.out
==6505== Memcheck, a memory error detector
==6505== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==6505== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==6505== Command: /usr/sbin/ns-slapd -D /etc/dirsrv/slapd-m1 -i /var/run/dirsrv/slapd-m1.pid
==6505== Parent PID: 1
...

From that point let the LDAP service run, do the tests until there is evidence of a memory leak, eventually by using top, exampel with 4 samples only:

lsof -i :389 > top.log; echo "" >> top.log; top -d 1 -b -M -n 4 -p `pidof valgrind` -n 4 >> top.log

Or list first 4 processes by memory usage:

ps auwwx|gawk '!/%MEM/ {print $4,$11}'|sort -nr|head -n4
58.5 /usr/bin/valgrind
...

Then provide the valgrind output file for review.

References:
How to use OS utilities to track down application memory leaks - https://access.redhat.com/solutions/32526

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments