Severe Performance Problem observed on RHEL 6.3, 6.5 (glib 2.12)

Latest response

We have found a severe performance problem observed on a Red Hat Enterprise Linux 6.3 and 6.5 release (glibc 2.12 ) and still there in 7.x (glibc 2.17 )

The code dynamically loads several large shared libraries via calls to

dlopen(
, RTLD_LAZY|RTLD_GLOBAL);

Profiling the very slow load via callgrind (Ref: http://valgrind.org/docs/manual/cl-manual.html ) shows very high CPU consumption below dlopen:

65 percent of all execution Ir is in and below the _dlerror_run api called from dlopen... ending up in a hotspot apparently dealing with mapping object deps:

35,562,166,954 * ???:_dlerror_run [/lib64/libdl-2.12.so]
[2,398 exclusive] (51x)
------ Called From:
35,562,091,939 < ???:dlopen@@GLIBC_2.2.5 (13x) [/lib64/libdl-2.12.so]
74,797 < ???:dlsym (37x) [/lib64/libdl-2.12.so]
218 < ???:dlclose (1x) [/lib64/libdl-2.12.so]
------ Called To:
35,562,093,098 > ???:_dl_catch_error (51x) [/lib64/ld-2.12.so]
61,378 > ???:_dl_runtime_resolve (5x) [/lib64/ld-2.12.so]
8,663 > ???:pthread_once (51x) [/lib64/libpthread-2.12.so]
912 > ???:pthread_getspecific (51x) [/lib64/libpthread-2.12.so]
237 > ???:calloc (1x) [/lib64/ld-2.12.so]
237 > ???:free (3x) [/lib64/ld-2.12.so]
31 > ???:pthread_setspecific (1x) [/lib64/libpthread-2.12.so]

... down to a hot-spot in . . .

41,369,813,625 * ???:_dl_map_object_deps [/lib64/ld-2.12.so]
[36,119,859,549 exclusive] (15x)
------ Called From:
35,501,642,440 < ???:dl_open_worker (14x) [/lib64/ld-2.12.so]
5,868,171,185 < ???:dl_main (1x) [/lib64/ld-2.12.so]
------ Called To:
4,931,367,940 > ???:memmove (12837311x) [/lib64/ld-2.12.so]
265,889,690 > ???:_dl_catch_error'3 (36121x) [/lib64/ld-2.12.so]
51,137,809 > ???:_dl_catch_error (6800x) [/lib64/ld-2.12.so]
1,381,392 > ???:index (42921x) [/lib64/ld-2.12.so]
113,853 > ???:memset (1335x) [/lib64/ld-2.12.so]
53,003 > ???:memcpy (415x) [/lib64/ld-2.12.so]
10,389 > ???:malloc (215x) [/lib64/ld-2.12.so]

Tests with environment variables variable LD_BIND_NOW=YES or LD_USE_LOAD_BIAS=1 show no significant improvement (if any).

Not sure if it is relevant, but the code was compiled on a gcc –version of 4.3.4.

Shared libraries are generally large with many dependencies, and are ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped

For comparison, the very same shared libraries loaded with the very same code execution on Suse 11 (glibc 2.11), callgrind shows performance a factor of 100-times better, like:

350,030,464 * ???:_dlerror_run [/lib64/libdl-2.11.3.so]
[2,904 exclusive] (62x)
------ Called From:
349,935,789 < ???:dlopen@@GLIBC_2.2.5 (13x) [/lib64/libdl-2.11.3.so]
94,428 < ???:dlsym (48x) [/lib64/libdl-2.11.3.so]
247 < ???:dlclose (1x) [/lib64/libdl-2.11.3.so]
------ Called To:
349,954,770 > ???:_dl_catch_error (62x) [/lib64/ld-2.11.3.so]
62,129 > ???:_dl_runtime_resolve (5x) [/lib64/ld-2.11.3.so]
8,986 > ???:pthread_once (62x) [/lib64/libpthread-2.11.3.so]
1,172 > ???:pthread_getspecific (62x) [/lib64/libpthread-2.11.3.so]
237 > ???:free (3x) [/lib64/ld-2.11.3.so]
235 > ???:calloc (1x) [/lib64/ld-2.11.3.so]
31 > ???:pthread_setspecific (1x) [/lib64/libpthread-2.11.3.so]

Anyone experienced this too?

Responses

I don't have much insight.. but, are you using prelink/preload cache on both hosts? Also - I wonder if ASLR could be causing issues. I'm hoping some gifted folks respond as this is going to be a good discussion I think ;-)

I know this is a rather old topic but maybe we are expiriencing a similar problem. Never used callgrind, so it will last some days for us to get results. But we saw that downgrading glibc to 2.12-1.47 (RHEL 6.2) did the trick and gives back performance. Could you also give it a try? We also opened a case for this.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.