Dell iDrac Service and libc-2.17 error

Latest response

On an DELL R640 Server we tried to install the OS iDRAC Service.

With RedHat 7.6 and libc 2.17-260.el7_6.5 the according service "dcismeng" runs fine.

With RedHat 7.7 and libc 2.17-292.el7 the dcismeng crashes with error "dsm_ism_srvmgrd[17900]: segfault at 0 ip 00007fe342515b45 sp 00007fe343fe7a68 error 4 in libc-2.17.so[7fe3423d5000+1c3000]"

Any hint on how to solve this?

Responses

hi, did you solve this issue ?

Hi No, but do you have new info on that?

Hi,

Experiencing the same issue here. RHEL 7.7 on DELL MX740c blades. Jun 30 22:50:08 ............ kernel: dsm_ism_srvmgrd[168315]: segfault at 0 ip 00007fcfa977eb45 sp 00007fcfab24ba58 error 4 in libc-2.17.so[7fcfa963e000+1c3000]

dcism-osc-5.0.0-41.x86_64 dcism-3.5.1-1949.el7.x86_64

No current ideas here either.

Just to make matters more confusing, i have another DELL MX740c, with RHEL7.7 installed. Install dcism 3.5.1 absolutely fine, runs no problem.

I've checked the packages, and only a few minor differences, nothing that should make any difference., same libc versions etc. I tried installing one of the two missing packages on the broken box, but made no difference.

Looking at things like LD_LIBRARY_PATH now, as the none working box has that setup.

Having the exact same issue with RHEL 7.8 on a Dell R730. kernel 3.10.0-1127.13.1.el7.x86_64, glibc glibc-2.17-307.el7.1.x86_64.

Technically this is a problem with the dsm_ism_srvmgrd process. It's doing something which is causing an illegal memory access (segmentation fault) and so is terminated.

That said, we can see this only happens with an upgraded glibc-2.17-292.el7 onwards. The process works fine on old versions such as glibc-2.17-260.el7.

It's possible that some bug has been fixed in glibc, and this process accidentally relied on that bug. The process has then become a "victim" of the bugfix. If so, the process needs to be corrected to follow the repaired behaviour.

It's possible that a bug has been introduced into glibc which this process just happens to hit. If so, we should probably fix that.

The correct troubleshooting path is to pursue the owner of dsm_ism_srvmgrd to quantify exactly what that process does to prompt the segfault.

If it's an apparent fault in glibc then we could investigate that in several ways - via support case, via TSANet engagement, or the vendor could pursue it via B2B partner relationship.