Master Article Troubleshooting "SSSD service is unable to start"

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux
  • SSSD

Issue

  • Unable to start sssd service after patching.
  • SSSD service fails to start.
  • Unable to start sssd service Could not open the sysdb cache [17]: File exists
  • Job for sssd.service failed because the control process exited with an error code.
  • krb5_kt_start_seq_get failed: Key table file /etc/krb5.keytab not found

Resolution

SSSD service can fail to start due to multiple reasons. i.e:

1. Check for any typos in sssd configuration files especially under /etc/sssd/conf.d/ directory and in /etc/sssd/sssd.conf

2. Correct the typos and restart sssd service. If it fails, try removing any customized sssd configuration under /etc/sssd/conf.d/ directory.

3. Make sure the permissions of /etc/sssd/sssd.conf, /var/log/sssd directory are correct.

  • 3.1 - Permission should be root:root and 600 for /etc/sssd/sssd.conf
  • 3.2 - Permissions should be root:root and 600 for all files under /var/log/sssd
  • 3.3 - Permissions should be sssd:sssd and 750 for /var/log/sssd directory.

For sssd-2.10 and above recommended mode is 640 and ownership 'root:sssd'

These should look like :

# ls -ld /var/log/sssd
drwxr-x---. 2 sssd sssd 4096 Jan  4 11:45 /var/log/sssd

#ls -lZ /etc/sssd/sssd.conf
-rw-------. 1 root root system_u:object_r:sssd_conf_t:s0 /etc/sssd/sssd.conf

# ll /var/log/sssd
total 204
-rw-------. 1 root root      0 Jun 21 12:00 krb5_child.log
-rw-------. 1 root root   9933 Jun 21 12:00 ldap_child.log
-rw-------. 1 root root      0 Jun 21 12:00 sssd_autofs.log
-rw-------. 1 root root 145183 Jun 21 12:00 sssd_example.com.log
-rw-------. 1 root root      0 Jun 21 12:00 sssd_ifp.log
-rw-------. 1 root root      0 Jun 21 12:00 sssd.log
-rw-------. 1 root root      0 Jun 21 12:00 sssd_nss.log
-rw-------. 1 root root      0 Jun 21 12:00 sssd_pac.log
-rw-------. 1 root root      0 Jun 21 12:00 sssd_pam.log
-rw-------. 1 root root      0 Jun 21 12:00 sssd_ssh.log
-rw-------. 1 root root      0 Jun 21 12:00 sssd_sudo.log

4. Check if the sssd user is present on the system. i.e:

# grep -i sss /etc/passwd
sssd:x:991:987:User for sssd:/:/sbin/nologin

5. Check the permission on the / directory:

# ll -dZ /
dr-xr-xr-x. 20 root root system_u:object_r:root_t:s0 267 Mar 27 03:44 /

6. Fix SElinux context if missing as below:

restorecon -Rv /etc/sssd/sssd.conf

7. Check if any module is missing. Detailed information can be obtained by running SSSD in daemon mode with debug. i.e:

# sssd -d9 -i

8. Make sure that, local cache files are not corrupted. Recreated them as -

# service sssd stop ; rm -f /var/lib/sss/{db,mc}/* ; service sssd start

9. Check if the primary group of the root user is set as root only. If the primary gidnumber of root user is different than 0, then change it.

# id -a
uid=0(root) gid=1(bin) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel)

# /usr/sbin/sssd -d 0x00F0 -i
(Sun Sep  1 22:50:17 2013) [sssd] [perform_checks] (0x0020): File must be owned by gid [0].
(Sun Sep  1 22:50:17 2013) [sssd] [sbus_new_server] (0x0020): check_file failed for [/var/lib/sss/pipes/private/sbus-monitor].

Root Cause

  • Error SSSD couldn't load the configuration database [22]: Invalid argument purely indicates configuration issue.
  • There was a typo in custom configuration file under /etc/sssd/conf.d/ directory.

Diagnostic Steps

  • Analyze /var/log/messages, /var/log/sssd/sssd.log files.
  • Analyze output of sssd -i -d9 command.

  • Here are some sample logs for different situations :

Typos in SSSD configuration

/var/log/messages:

Apr  1 07:47:38 rhel7 sssd: SSSD couldn't load the configuration database [22]: Invalid argument.  <-----
Apr  1 07:47:38 rhel7 systemd: sssd.service: main process exited, code=exited, status=4/NOPERMISSION
Apr  1 07:47:38 rhel7 systemd: Failed to start System Security Services Daemon.
Apr  1 07:47:38 rhel7 systemd: Unit sssd.service entered failed state.
Apr  1 07:47:38 rhel7 systemd: sssd.service failed.

/var/log/sssd.log:

(Wed Apr  1 07:47:38:248190 2020) [sssd] [sss_ini_call_validators] (0x0020): [rule/allowed_sections]: Section [domain/] is not allowed. Check for typos.  <---
(Wed Apr  1 07:47:38:248556 2020) [sssd] [confdb_ldif_from_ini_file] (0x0010): Could not create LDIF for confdb
(Wed Apr  1 07:47:38:248591 2020) [sssd] [confdb_init_db] (0x0020): Cannot convert INI to LDIF [22]: [Invalid argument]
(Wed Apr  1 07:47:38:248639 2020) [sssd] [confdb_setup] (0x0010): ConfDB initialization has failed [22]: Invalid argument
(Wed Apr  1 07:47:38:248683 2020) [sssd] [load_configuration] (0x0010): Unable to setup ConfDB [22]: Invalid argument
(Wed Apr  1 07:47:38:248706 2020) [sssd] [main] (0x0020): SSSD couldn't load the configuration database.

# sssd -d9 -i

(Wed Apr  1 08:33:22:630530 2020) [sssd] [sss_ini_get_config] (0x0400): Config merge success: /etc/sssd/conf.d/custom.conf  <--- Problematic file
(Wed Apr  1 08:33:22:633174 2020) [sssd] [sss_ini_call_validators] (0x0020): [rule/allowed_sections]: Section [domain/] is not allowed. Check for typos.  <--- Exact configuration issue
...
(Wed Apr  1 08:33:22:634251 2020) [sssd] [confdb_ldif_from_ini_file] (0x0010): Could not create LDIF for confdb
(Wed Apr  1 08:33:22:634281 2020) [sssd] [confdb_init_db] (0x0020): Cannot convert INI to LDIF [22]: [Invalid argument]
(Wed Apr  1 08:33:22:634338 2020) [sssd] [confdb_setup] (0x0010): ConfDB initialization has failed [22]: Invalid argument
(Wed Apr  1 08:33:22:634383 2020) [sssd] [load_configuration] (0x0010): Unable to setup ConfDB [22]: Invalid argument
(Wed Apr  1 08:33:22:634404 2020) [sssd] [main] (0x0020): SSSD couldn't load the configuration database.   <----

Permissions of sssd.conf, or sssd log files

[sssd] [load_configuration] (0): ConfDB initialization has failed [Operation not permitted]


sssd[15176]: Could not open file [/var/log/sssd/sssd.log]. Error: [13][Permission denied] <----
systemd[1]: sssd.service: main process exited, code=exited, status=7/NOTRUNNING <-----

SSSD failing to load modules

[sssd[be[EXAMPLE.COM]]] [dp_module_open_lib] (0x1000): Loading module [ad] with path [/usr/lib64/sssd/libsss_ad.so]
[sssd[be[EXAMPLE.COM]]] [dp_module_open_lib] (0x0010): Unable to load module [ad] with path [/usr/lib64/sssd/libsss_ad.so]: libwbclient.so.0: cannot open shared object file: No such file or directory 
[sssd[be[EXAMPLE.COM]]] [dp_load_module] (0x0020): Unable to create DP module.
[sssd[be[EXAMPLE.COM]]] [dp_target_init] (0x0010): Unable to load module ad
[sssd[be[EXAMPLE.COM]]] [dp_load_targets] (0x0020): Unable to load target [id] [80]: Accessing a corrupted shared library.
[sssd[be[EXAMPLE.COM]]] [dp_init] (0x0020): Unable to initialize DP targets [1432158209]: Internal Error

sssd user is not present on the system

# systemctl status sssd
   ● sssd.service - System Security Services Daemon
   Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Fri 2020-12-04 04:30:02 EST; 9s ago
   Process: 20891 ExecStart=/usr/sbin/sssd -i ${DEBUG_LOGGER} (code=exited, status=1/FAILURE)
  Main PID: 20891 (code=exited, status=1/FAILURE)

 Dec 04 04:29:56 rhel7.example.com sssd[20891]: update failed: REFUSED
 Dec 04 04:29:58 rhel7.example.com nss[20909]: Starting up
 Dec 04 04:30:02 rhel7.example.com nss[20910]: Starting up
 Dec 04 04:30:02 rhel7.example.com sssd[20891]: Exiting the SSSD. Could not restart critical service [nss].       <<-----
 Dec 04 04:30:02 rhel7.example.com pam[20894]: Shutting down
 Dec 04 04:30:02 rhel7.example.com be[example.com][20892]: Shutting down
 Dec 04 04:30:02 rhel7.example.com systemd[1]: sssd.service: main process exited, code=exited, status=1/FAILURE
 Dec 04 04:30:02 rhel7.example.com systemd[1]: Failed to start System Security Services Daemon. <<-----
 Dec 04 04:30:02 rhel7.example.com systemd[1]: Unit sssd.service entered failed state.
 Dec 04 04:30:02 rhel7.example.com systemd[1]: sssd.service failed. <<-----

Following logs indicate that the local SSSD cache files have been corrupted

(Fri Mar 22 21:34:27 2019) [sssd[be[LDAP]]] [ldb] (0x0010): ltdb: tdb(/var/lib/sss/db/cache_LDAP.ldb): tdb_rec_read bad magic 0x0 at offset=5221096
(Fri Mar 22 21:43:03 2019) [sssd[be[LDAP]]] [ldb] (0x0010): ltdb: tdb(/var/lib/sss/db/cache_LDAP.ldb): tdb_rec_read bad magic 0x0 at offset=5221096
(Sat Mar 23 12:34:08 2019) [sssd[be[LDAP]]] [ldb] (0x0010): ltdb: tdb(/var/lib/sss/db/cache_LDAP.ldb): tdb_rec_read bad magic 0xd9fee666 at offset=6006376
(Mon Mar 25 11:59:13 2019) [sssd[be[LDAP]]] [ldb] (0x0010): ltdb: tdb(/var/lib/sss/db/cache_LDAP.ldb): tdb_rec_read bad magic 0xd9fee666 at offset=6006376

In one of the cases, it was identified that the primary group of root user has been modified. Check for the following in the domain log file of SSSD

(Sun Sep  1 22:50:17 2013) [sssd] [perform_checks] (0x0020): File must be owned by gid [0].
(Sun Sep  1 22:50:17 2013) [sssd] [sbus_new_server] (0x0020): check_file failed for [/var/lib/sss/pipes/private/sbus-monitor].

Following logs indicate that the permission of the /etc/sssd/sssd.conf is not correct or there may be any typo in the /etc/sssd/sssd.conf file

/var/log/messages:

Apr  1 07:47:38 rhel7-test sssd: SSSD couldn't load the configuration database [22]: Invalid argument.  <----
Apr  1 07:47:38 rhel7-test systemd: sssd.service: main process exited, code=exited, status=4/NOPERMISSION
Apr  1 07:47:38 rhel7-test systemd: Failed to start System Security Services Daemon.
Apr  1 07:47:38 rhel7-test systemd: Unit sssd.service entered failed state.
Apr  1 07:47:38 rhel7-test systemd: sssd.service failed.

/var/log/sssd.log:

(Wed Apr  1 07:47:38:248190 2020) [sssd] [sss_ini_call_validators] (0x0020): [rule/allowed_sections]: Section [domain/] is not allowed. Check for typos.  <----
(Wed Apr  1 07:47:38:248556 2020) [sssd] [confdb_ldif_from_ini_file] (0x0010): Could not create LDIF for confdb
(Wed Apr  1 07:47:38:248591 2020) [sssd] [confdb_init_db] (0x0020): Cannot convert INI to LDIF [22]: [Invalid argument]
(Wed Apr  1 07:47:38:248639 2020) [sssd] [confdb_setup] (0x0010): ConfDB initialization has failed [22]: Invalid argument
(Wed Apr  1 07:47:38:248683 2020) [sssd] [load_configuration] (0x0010): Unable to setup ConfDB [22]: Invalid argument
(Wed Apr  1 07:47:38:248706 2020) [sssd] [main] (0x0020): SSSD couldn't load the configuration database.

Following logs indicate that the local SSSD cache files have been corrupted

[sssd] [sysdb_domain_init_internal] (0x0020): Could not open the sysdb cache [17]: File exists
[sssd] [sysdb_init_ext] (0x0020): Cannot connect to database for example.com: [17]: File exists

The following error msg indicates the permission of /etc/sssd/sssd.conf file is not correct. Ensure SSSD config file /etc/sssd/sssd.conf is owned by root and has permission 600

[sssd] [sss_ini_read_sssd_conf] (0x0020): Permission check on config file failed.
[sssd] [confdb_init_db] (0x0020): Cannot convert INI to LDIF [1432158317]: [File ownership and permissions check failed]
[sssd] [confdb_setup] (0x0010): ConfDB initialization has failed [1432158317]: File ownership and permissions check failed
[sssd] [load_configuration] (0x0010): Unable to setup ConfDB [1432158317]: File ownership and permissions check failed
[sssd] [main] (0x0010): SSSD couldn't load the configuration database.

Following journalctl logs indicate there is not enough space on the system. Check the underlining space under /, /etc, /var and /tmp

# journalctl -u sssd.service --no-pager

janv. 16 16:48:46 rhel7 ldap_child[2992126]: Failed to initialize credentials using keytab [MEMORY:/etc/krb5.keytab]: sss_unique_filename() failed: [28][No space left on device]. Unable to create GSSAPI-encrypted LDAP connection.

The following msg indicates there is a typo in the /etc/sssd/sssd.conf Invalid base DN ["cn=users,dc=example,dc=com"] validate the config file using sssctl config-check command. Any typos can be detected using sssctl config-check command.

(2023-09-19  9:13:05): [be[example.com]] [ad_set_search_bases] (0x0100): Search base not set. SSSD will attempt to discover it later, when connecting to the LDAP server.
(2023-09-19  9:13:05): [be[example.com]] [sdap_create_search_base] (0x0020): Invalid base DN ["cn=users,dc=example,dc=com"]
(2023-09-19  9:13:05): [be[example.com]] [common_parse_search_base] (0x0040): Cannot create new sdap search base
(2023-09-19  9:13:05): [be[example.com]] [sssm_ad_init] (0x0020): Unable to init AD id options
(2023-09-19  9:13:05): [be[example.com]] [dp_module_run_constructor] (0x0010): Module [ad] constructor failed [22]: Invalid argument

The following msg indicates the keytab table is not present this can be verified using klist -kte

Nov  3 11:55:15  hostname sssd_be[108263]: Failed to read keytab [FILE:/etc/krb5.keytab]: No suitable principal found in keytab
Nov  3 11:55:17  hostname sssd_be[108273]: krb5_kt_start_seq_get failed: Key table file '/etc/krb5.keytab' not found
Nov  3 11:55:17  hostname sssd_be[108273]: krb5_kt_start_seq_get failed: Key table file '/etc/krb5.keytab' not found
Nov  3 11:55:17  hostname sssd_be[108273]: krb5_kt_start_seq_get failed: Key table file '/etc/krb5.keytab' not found
Nov  3 11:55:17  hostname sssd_be[108273]: krb5_kt_start_seq_get failed: Key table file '/etc/krb5.keytab' not found

The following error indicates no shared object file is available

/usr/sbin/sssd: error while loading shared libraries: libldb.so.1: cannot open shared object file: No ...directory

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments