SSSD With Large AD and Groups

Latest response

Does anybody have experience with SSSD and a large Active Directory? We have over 150,000 user accounts and 25,000 groups, and a significant number of GPOs as well. I've received reports that RHEL servers that authenticate against our AD can experience login times of up to 10 minutes, which I assume has to do with how it is enumerating groups and group memberships. Other servers seem to not have any problems at all.

I've tried modifying the sssd.conf file to ignore nested groups, but even still users report slow logins. Are there any other ideas on how to improve login times? Maybe it isn't group related at all? Any other known sources of slow login?

Thanks!

Responses

Hi Kevin,

Ten minutes login time can't be the normal behavior ... I recommend to open a Support Case and/or contact Customer Service. :)

Regards,
Christian

Kevin, do all or some such clients experience this issue?

Is DNS straight on these systems (start first with client resolv.conf and test nslookup and dig commands)?

How long does it take to run id someVALIDuserid on an attached system?

Do you have nfs attached or autofs attached user home drives? If so interrogate that issue to ensure NFS is not causing a very bad day for you. If you have stale nfs mounts, or non-functioning nfs, that can be an issue. Attempt to manually mount one of the valid nfs mounts to /mnt or something too just to see if it works.

If you log in to one of the affected systems as a local non-AD user, become root, then from the root account attempt to su - to a valid userid. Does that take a while?

If you think sssd might be an issue, examine this link

Did this work and then fail to work recently? Any recent patches?

Examine this solution https://access.redhat.com/solutions/1475233 to see if it is a fit for your issue.

Sample the configs of /etc/sssd/sssd.conf to ensure rational configs on some clients (in case someone pushed a config change perhaps).

Concurrently take Christian Labisch's advice above

Regards

RJ

Thanks for the response RJ, I'll be opening a support case shortly.

To answer your questions, DNS seems to be working without issue, and I've verified connectivity to all the DCs through our firewalls, so I'm pretty confident it isn't name related.

The id command can definitely take a significant amount of time to run, my account is a member of 333 groups and it takes about 10 minutes to finish running.

I know of at least four servers with the slow login speeds without any NFS mounts, so I don't expect that to be related.

Since its seems to be an annoyingly intermittent issue, I'll try to su next time I have a server that's misbehaving and follow up.

This has been an issue since we started binding servers to AD, and we've been through many patch cycles (although we've only tried it on RHEL 7).

I've tried setting those sssd.conf values, and it seems to cause slowness in different places - i.e. during initial password validation and sudo su commands, versus the normal slow behavior before entering a password during ssh. That confused me in and of itself.

Finally, the config file is essentially what sssd writes out by default, only changing use_fully_qualified_names to false and specifying the home directory format (as well as limiting access to specific groups).

I'm currently testing setting ignore_group_members to true and waiting to see if I experience or hear of any improvements.

Thanks again for your response!

I'll see if I can provide anything else Kevin.

I have two different customers who use AD in two different ways. One goes straight to AD using RHEL7's ability to do so, the other has a highly configured non-simplistic AD and IDM integration.

Are your "default_domain_suffix" and "domains" directives correct in /etc/sssd/sssd.conf? However, I seriously doubt those 2 directives would cause the slowness you speak of.

I get the idea you are not using IDM? is that right?

Is there anything relevant in the logs generated on clients that are experiencing the issue? (go to /var/log of course as root, do an ls -ltr in the /var/log directory, and examine the latest files)

Regards

RJ

Funny you should mention IdM...We originally tried that as a replacement to our OpenLDAP system, but it couldn't handle the number of users we tried to load into it. We then tried to set it up with a trust with our AD, but didn't really like the user management across the two systems, and ultimately decided we would keep OpenLDAP and start to join servers to AD (rather than the kerberos realm OpenLDAP uses).

Domains and suffixes are good in sssd. Its been a while since I looked at the logs that sssd provided (long enough i forget why i stopped using them for troubleshooting), but its probably a good idea to revisit that and get debug logging up as well. Hopefully i can get something useful out of there and narrow it down better.

Thanks!

Ok, in one environment where we go straight to AD when the AD server logs go 100% full, it caused systemic horrid latency (for those systems that went to whatever domain controller had full logs).

I'd recommend having one of your windows admins ensure the system logs (all windows active directory server logs) are not 100% full. In one environment where our Linux systems go straight to AD, this caused bad issues (at least for us).

Regards

RJ

Hmm that's interesting, I'll check out our DCs to see if there are any problems (I'm pretty sure they purge their logs pretty quickly though, as they are all sent to a log aggregator). For now I think I just need to find a server that's experiencing slow logins and do some log analysis and troubleshooting. I appreciate all the help!

Well aside from it being RHEL 6 (not sure if it matters in this case) that was what seemed to cause slowness in different places - during initial password validation and sudo su commands, versus the normal slow behavior before entering a password during ssh. I wasn't sure what to make of that, but it unfortunately didn't solve the issue.

Also see the bit above about domain controller logs, and that RH solution

Regards RJ

Kevin, are the domain controller logs full at all? that caused issues to us in our environment.

DCs are fine, i also played around with my hosts file one of the servers to see if it was something with a specific DC and everything was quick, so it must be some other transient condition.

Kevin, I'm thinking Red Hat support might be the best avenue right now, perhaps besides examining these links:

(try "enumerate false" in sssd.conf on one system too)

Note, this might not be relevant since it is from 2015 https://www.reddit.com/r/linuxadmin/comments/3asb35/problems_with_sssd_ad/

## comments from above, try on a client maybe.
 'ignore_group_members' certainly helped with avoiding recursion, but it breaks nested group checking.
ldap_referrals = False This! .. from 2min delay to 1sec

From this link - increase the debug level of logging for sssd: To change the log level, set the debug_level parameter for each section in the sssd.conf file for which to produce extra logs. For example:

NOTE: run systemctl daemon-reload;systemctl restart sssd after changing this. You can also put this under the [sssd] header. I'd only do this temporarily for debug purposes.

## client system, from link above
## note, this might appear as [domain/FQDN]
[domain/LDAP]
cache_credentials = true
debug_level = 9

Then examine logs on a client system: ssh to one of the systems, do a tail -f of related sssd logs, messages and then have someone attempt to log in.

Elevate if necessary with Red Hat. The above might not help, is a bit of a stab.

Regards

RJ

I agree, i think RH is the next step. Thanks for all of your suggestions, i really appreciate the ideas. I'll update the post when i find a solution.

Hi Kevin and RJ,

Looks as if the recommendation I've posted in my first response was not that wrong - right ? :D

Regards,
Christian

Circling back around on this, adding the below line to sssd.conf has resolved the problem for us. You do lose the ability to nest group members, but if you're OK with that it solves the performance issues.

 ignore_group_members = True

I'm glad you got this resolved, thanks for posting this Kevin!

Regards

RJ

Hi Kevin,

I'm glad you've got it resolved, too ... thanks for sharing your solution. :)

Regards,
Christian

Hi Kevin,

Even I have mentioned entries in the sssd.conf file but still taking some time when i use sudo.

[krwg074@cnawlfcapts01 ~]$ time sudo su - Last login: Thu Nov 14 20:42:40 EST 2019 from sesklsshcpc01.astrazeneca.net on pts/0 [root@cnawlfcapts01 ~]# logout

real 1m41.833s user 0m0.073s sys 0m0.062s

[krwg074@cnawlfcapts01 ~]$ time sudo -l Matching Defaults entries for krwg074 on cnawlfcapts01: !visiblepw, always_set_home, match_group_by_gid, always_query_group_plugin, env_reset, env_keep="COLORS DISPLAY HOSTNAME HISTSIZE KDEDIR LS_COLORS", env_keep+="MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE", env_keep+="LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES", env_keep+="LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE", env_keep+="LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY", secure_path=/sbin:/bin:/usr/sbin:/usr/bin

User krwg074 may run the following commands on cnawlfcapts01: (ALL) NOPASSWD: ALL

real 3m57.908s user 0m0.009s sys 0m0.006s [krwg074@cnawlfcapts01 ~]$

On the system that has lengthy delays, enter this command:

systemctl | grep  fail

Let us know

RJ

[root@cnawlfcapts01 ~]# systemctl | grep fail ● kdump.service loaded failed failed Crash recovery kernel arming [root@cnawlfcapts01 ~]#

[root@cnawlfcappd01 ~]# systemctl | grep fail ● kdump.service loaded failed failed Crash recovery kernel arming [root@cnawlfcappd01 ~]#

This is my sssd.conf file entry.

[root@cnawlfcappd01 ~]# cat /etc/sssd/sssd.conf [sssd] domains = astrazeneca.net config_file_version = 2 services = nss, pam, sudo, ssh, ifp domain_resolution_order = astrazeneca.net, americas.astrazeneca.net, asiapac.astrazeneca.net, emea.astrazeneca.net, rd.astrazeneca.net full_name_format = %1$s

[domain/astrazeneca.net] ad_domain = astrazeneca.net krb5_realm = ASTRAZENECA.NET realmd_tags = manages-system joined-with-samba cache_credentials = True id_provider = ad ad_enable_dns_sites = True krb5_store_password_if_offline = True default_shell = /bin/bash override_homedir = /home/%u ldap_id_mapping = True use_fully_qualified_names = False fallback_homedir = /home/%u@%d access_provider = simple auth_provider = ad chpass_provider = ad ldap_schema = ad dyndns_update = true ignore_group_members = True simple_allow_groups = xaz-global-ecsmorpheus-sysadmin@astrazeneca.net, xaz-pamaz-ecs-hosting-unix@astrazeneca.net, xaz-pamaz-aziamunix@astrazeneca.net, XAZ-SYSTEM-LINUX-SERVICEACCTS@astrazeneca.net, XAZ-PAMAZ-AZ-CSIS-IAM-Hybrid-Operations@astrazeneca.net, XEM-AZ-ECSBackup-Admin@astrazeneca.net, "XAZ-Global ECS Backup Administrators "@astrazeneca.net

[pam]

[nss] filter_users = root, admin, svtanium, unxcfmgr, azastmgt, azmorphdata, cloud-user, cloud-init, ecs_user, httpd, apache, tomcat, jenkins filter_groups = root, unixinfra, unxinfra, unixadm, usradmin

[root@cnawlfcappd01 ~]#

Kindly check and do the needful.