SSSD process child was terminated by own WATCHDOG
Issue
We are frequently getting messages like this:
2021-05-01T10:00:50.633216-04:00 system1 sssd[sssd]: Child [1277] ('SSSDdomain':'%BE_SSSDdomain') was terminated by own WATCHDOG. Consult corresponding logs to figure out the reason.
We have had some reports of job failures with times that correspond to the watchdog terminated messages for BE_SSSDdomain which leads us to believe that the short outage of LDAP lookups is causing problems.
This issue is not occurring all the time, it's very sporadic, like this particular SSSD client host has only seen the watchdog kill the SSSD main service once in past week but has killed the NSS service 10 times in last week.
Environment
- Red Hat Enterprise Linux (RHEL) 7
- redhat-release-server-7.9-6.el7_9.x86_64
- sssd-1.16.5-10.el7_9.7.x86_64
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.