Severe Performance Problem observed on RHEL 6.3, 6.5 (glib 2.12)
We have found a severe performance problem observed on a Red Hat Enterprise Linux 6.3 and 6.5 release (glibc 2.12 ) and still there in 7.x (glibc 2.17 )
The code dynamically loads several large shared libraries via calls to
dlopen( , RTLD_LAZY|RTLD_GLOBAL);
Profiling the very slow load via callgrind (Ref: http://valgrind.org/docs/manual/cl-manual.html ) shows very high CPU consumption below dlopen:
65 percent of all execution Ir is in and below the _dlerror_run api called from dlopen... ending up in a hotspot apparently dealing with mapping object deps:
35,562,166,954 * ???:_dlerror_run [/lib64/libdl-2.12.so]
[2,398 exclusive] (51x)
------ Called From:
35,562,091,939 ???:dlopen@@GLIBC_2.2.5 (13x) [/lib64/libdl-2.12.so]
74,797 ???:dlsym (37x) [/lib64/libdl-2.12.so]
218 ???:dlclose (1x) [/lib64/libdl-2.12.so]
------ Called To:
35,562,093,098 > ???:_dl_catch_error (51x) [/lib64/ld-2.12.so]
61,378 > ???:_dl_runtime_resolve (5x) [/lib64/ld-2.12.so]
8,663 > ???:pthread_once (51x) [/lib64/libpthread-2.12.so]
912 > ???:pthread_getspecific (51x) [/lib64/libpthread-2.12.so]
237 > ???:calloc (1x) [/lib64/ld-2.12.so]
237 > ???:free (3x) [/lib64/ld-2.12.so]
31 > ???:pthread_setspecific (1x) [/lib64/libpthread-2.12.so]
... down to a hot-spot in . . .
41,369,813,625 * ???:_dl_map_object_deps [/lib64/ld-2.12.so]
[36,119,859,549 exclusive] (15x)
------ Called From:
35,501,642,440 ???:dl_open_worker (14x) [/lib64/ld-2.12.so]
5,868,171,185 ???:dl_main (1x) [/lib64/ld-2.12.so]
------ Called To:
4,931,367,940 > ???:memmove (12837311x) [/lib64/ld-2.12.so]
265,889,690 > ???:_dl_catch_error'3 (36121x) [/lib64/ld-2.12.so]
51,137,809 > ???:_dl_catch_error (6800x) [/lib64/ld-2.12.so]
1,381,392 > ???:index (42921x) [/lib64/ld-2.12.so]
113,853 > ???:memset (1335x) [/lib64/ld-2.12.so]
53,003 > ???:memcpy (415x) [/lib64/ld-2.12.so]
10,389 > ???:malloc (215x) [/lib64/ld-2.12.so]
Tests with environment variables variable LD_BIND_NOW=YES or LD_USE_LOAD_BIAS=1 show no significant improvement (if any).
Not sure if it is relevant, but the code was compiled on a gcc –version of 4.3.4.
Shared libraries are generally large with many dependencies, and are ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
For comparison, the very same shared libraries loaded with the very same code execution on Suse 11 (glibc 2.11), callgrind shows performance a factor of 100-times better, like:
350,030,464 * ???:_dlerror_run [/lib64/libdl-2.11.3.so]
[2,904 exclusive] (62x)
------ Called From:
349,935,789 ???:dlopen@@GLIBC_2.2.5 (13x) [/lib64/libdl-2.11.3.so]
94,428 ???:dlsym (48x) [/lib64/libdl-2.11.3.so]
247 ???:dlclose (1x) [/lib64/libdl-2.11.3.so]
------ Called To:
349,954,770 > ???:_dl_catch_error (62x) [/lib64/ld-2.11.3.so]
62,129 > ???:_dl_runtime_resolve (5x) [/lib64/ld-2.11.3.so]
8,986 > ???:pthread_once (62x) [/lib64/libpthread-2.11.3.so]
1,172 > ???:pthread_getspecific (62x) [/lib64/libpthread-2.11.3.so]
237 > ???:free (3x) [/lib64/ld-2.11.3.so]
235 > ???:calloc (1x) [/lib64/ld-2.11.3.so]
31 > ???:pthread_setspecific (1x) [/lib64/libpthread-2.11.3.so]
Anyone experienced this too?
Responses