Select Your Language

Infrastructure and Management

Cloud Computing

Storage

Runtimes

Integration and Automation

  • Comments
  • Severe Performance Problem observed on RHEL 6.3, 6.5 (glib 2.12)

    Posted on

    We have found a severe performance problem observed on a Red Hat Enterprise Linux 6.3 and 6.5 release (glibc 2.12 ) and still there in 7.x (glibc 2.17 )

    The code dynamically loads several large shared libraries via calls to

    dlopen( , RTLD_LAZY|RTLD_GLOBAL);

    Profiling the very slow load via callgrind (Ref: http://valgrind.org/docs/manual/cl-manual.html ) shows very high CPU consumption below dlopen:

    65 percent of all execution Ir is in and below the _dlerror_run api called from dlopen... ending up in a hotspot apparently dealing with mapping object deps:

    35,562,166,954 * ???:_dlerror_run [/lib64/libdl-2.12.so]
    [2,398 exclusive] (51x)
    ------ Called From:
    35,562,091,939 ???:dlopen@@GLIBC_2.2.5 (13x) [/lib64/libdl-2.12.so]
    74,797 ???:dlsym (37x) [/lib64/libdl-2.12.so]
    218 ???:dlclose (1x) [/lib64/libdl-2.12.so]
    ------ Called To:
    35,562,093,098 > ???:_dl_catch_error (51x) [/lib64/ld-2.12.so]
    61,378 > ???:_dl_runtime_resolve (5x) [/lib64/ld-2.12.so]
    8,663 > ???:pthread_once (51x) [/lib64/libpthread-2.12.so]
    912 > ???:pthread_getspecific (51x) [/lib64/libpthread-2.12.so]
    237 > ???:calloc (1x) [/lib64/ld-2.12.so]
    237 > ???:free (3x) [/lib64/ld-2.12.so]
    31 > ???:pthread_setspecific (1x) [/lib64/libpthread-2.12.so]

    ... down to a hot-spot in . . .

    41,369,813,625 * ???:_dl_map_object_deps [/lib64/ld-2.12.so]
    [36,119,859,549 exclusive] (15x)
    ------ Called From:
    35,501,642,440 ???:dl_open_worker (14x) [/lib64/ld-2.12.so]
    5,868,171,185 ???:dl_main (1x) [/lib64/ld-2.12.so]
    ------ Called To:
    4,931,367,940 > ???:memmove (12837311x) [/lib64/ld-2.12.so]
    265,889,690 > ???:_dl_catch_error'3 (36121x) [/lib64/ld-2.12.so]
    51,137,809 > ???:_dl_catch_error (6800x) [/lib64/ld-2.12.so]
    1,381,392 > ???:index (42921x) [/lib64/ld-2.12.so]
    113,853 > ???:memset (1335x) [/lib64/ld-2.12.so]
    53,003 > ???:memcpy (415x) [/lib64/ld-2.12.so]
    10,389 > ???:malloc (215x) [/lib64/ld-2.12.so]

    Tests with environment variables variable LD_BIND_NOW=YES or LD_USE_LOAD_BIAS=1 show no significant improvement (if any).

    Not sure if it is relevant, but the code was compiled on a gcc –version of 4.3.4.

    Shared libraries are generally large with many dependencies, and are ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped

    For comparison, the very same shared libraries loaded with the very same code execution on Suse 11 (glibc 2.11), callgrind shows performance a factor of 100-times better, like:

    350,030,464 * ???:_dlerror_run [/lib64/libdl-2.11.3.so]
    [2,904 exclusive] (62x)
    ------ Called From:
    349,935,789 ???:dlopen@@GLIBC_2.2.5 (13x) [/lib64/libdl-2.11.3.so]
    94,428 ???:dlsym (48x) [/lib64/libdl-2.11.3.so]
    247 ???:dlclose (1x) [/lib64/libdl-2.11.3.so]
    ------ Called To:
    349,954,770 > ???:_dl_catch_error (62x) [/lib64/ld-2.11.3.so]
    62,129 > ???:_dl_runtime_resolve (5x) [/lib64/ld-2.11.3.so]
    8,986 > ???:pthread_once (62x) [/lib64/libpthread-2.11.3.so]
    1,172 > ???:pthread_getspecific (62x) [/lib64/libpthread-2.11.3.so]
    237 > ???:free (3x) [/lib64/ld-2.11.3.so]
    235 > ???:calloc (1x) [/lib64/ld-2.11.3.so]
    31 > ???:pthread_setspecific (1x) [/lib64/libpthread-2.11.3.so]

    Anyone experienced this too?

    by

    points

    Responses

    Red Hat LinkedIn YouTube Facebook X, formerly Twitter

    Quick Links

    Help

    Site Info

    Related Sites

    © 2026 Red Hat