SSSD With Large AD and Groups

Latest response

Does anybody have experience with SSSD and a large Active Directory? We have over 150,000 user accounts and 25,000 groups, and a significant number of GPOs as well. I've received reports that RHEL servers that authenticate against our AD can experience login times of up to 10 minutes, which I assume has to do with how it is enumerating groups and group memberships. Other servers seem to not have any problems at all.

I've tried modifying the sssd.conf file to ignore nested groups, but even still users report slow logins. Are there any other ideas on how to improve login times? Maybe it isn't group related at all? Any other known sources of slow login?

Thanks!

Responses

Hi Kevin,

Ten minutes login time can't be the normal behavior ... I recommend to open a Support Case and/or contact Customer Service. :)

Regards,
Christian

Kevin, do all or some such clients experience this issue?

Is DNS straight on these systems (start first with client resolv.conf and test nslookup and dig commands)?

How long does it take to run id someVALIDuserid on an attached system?

Do you have nfs attached or autofs attached user home drives? If so interrogate that issue to ensure NFS is not causing a very bad day for you. If you have stale nfs mounts, or non-functioning nfs, that can be an issue. Attempt to manually mount one of the valid nfs mounts to /mnt or something too just to see if it works.

If you log in to one of the affected systems as a local non-AD user, become root, then from the root account attempt to su - to a valid userid. Does that take a while?

If you think sssd might be an issue, examine this link

Did this work and then fail to work recently? Any recent patches?

Examine this solution https://access.redhat.com/solutions/1475233 to see if it is a fit for your issue.

Sample the configs of /etc/sssd/sssd.conf to ensure rational configs on some clients (in case someone pushed a config change perhaps).

Concurrently take Christian Labisch's advice above

Regards

RJ

Thanks for the response RJ, I'll be opening a support case shortly.

To answer your questions, DNS seems to be working without issue, and I've verified connectivity to all the DCs through our firewalls, so I'm pretty confident it isn't name related.

The id command can definitely take a significant amount of time to run, my account is a member of 333 groups and it takes about 10 minutes to finish running.

I know of at least four servers with the slow login speeds without any NFS mounts, so I don't expect that to be related.

Since its seems to be an annoyingly intermittent issue, I'll try to su next time I have a server that's misbehaving and follow up.

This has been an issue since we started binding servers to AD, and we've been through many patch cycles (although we've only tried it on RHEL 7).

I've tried setting those sssd.conf values, and it seems to cause slowness in different places - i.e. during initial password validation and sudo su commands, versus the normal slow behavior before entering a password during ssh. That confused me in and of itself.

Finally, the config file is essentially what sssd writes out by default, only changing use_fully_qualified_names to false and specifying the home directory format (as well as limiting access to specific groups).

I'm currently testing setting ignore_group_members to true and waiting to see if I experience or hear of any improvements.

Thanks again for your response!

I'll see if I can provide anything else Kevin.

I have two different customers who use AD in two different ways. One goes straight to AD using RHEL7's ability to do so, the other has a highly configured non-simplistic AD and IDM integration.

Are your "default_domain_suffix" and "domains" directives correct in /etc/sssd/sssd.conf? However, I seriously doubt those 2 directives would cause the slowness you speak of.

I get the idea you are not using IDM? is that right?

Is there anything relevant in the logs generated on clients that are experiencing the issue? (go to /var/log of course as root, do an ls -ltr in the /var/log directory, and examine the latest files)

Regards

RJ

Funny you should mention IdM...We originally tried that as a replacement to our OpenLDAP system, but it couldn't handle the number of users we tried to load into it. We then tried to set it up with a trust with our AD, but didn't really like the user management across the two systems, and ultimately decided we would keep OpenLDAP and start to join servers to AD (rather than the kerberos realm OpenLDAP uses).

Domains and suffixes are good in sssd. Its been a while since I looked at the logs that sssd provided (long enough i forget why i stopped using them for troubleshooting), but its probably a good idea to revisit that and get debug logging up as well. Hopefully i can get something useful out of there and narrow it down better.

Thanks!

Ok, in one environment where we go straight to AD when the AD server logs go 100% full, it caused systemic horrid latency (for those systems that went to whatever domain controller had full logs).

I'd recommend having one of your windows admins ensure the system logs (all windows active directory server logs) are not 100% full. In one environment where our Linux systems go straight to AD, this caused bad issues (at least for us).

Regards

RJ

Hmm that's interesting, I'll check out our DCs to see if there are any problems (I'm pretty sure they purge their logs pretty quickly though, as they are all sent to a log aggregator). For now I think I just need to find a server that's experiencing slow logins and do some log analysis and troubleshooting. I appreciate all the help!

Well aside from it being RHEL 6 (not sure if it matters in this case) that was what seemed to cause slowness in different places - during initial password validation and sudo su commands, versus the normal slow behavior before entering a password during ssh. I wasn't sure what to make of that, but it unfortunately didn't solve the issue.

Also see the bit above about domain controller logs, and that RH solution

Regards RJ

Kevin, are the domain controller logs full at all? that caused issues to us in our environment.

DCs are fine, i also played around with my hosts file one of the servers to see if it was something with a specific DC and everything was quick, so it must be some other transient condition.

Kevin, I'm thinking Red Hat support might be the best avenue right now, perhaps besides examining these links:

(try "enumerate false" in sssd.conf on one system too)

Note, this might not be relevant since it is from 2015 https://www.reddit.com/r/linuxadmin/comments/3asb35/problems_with_sssd_ad/

## comments from above, try on a client maybe.
 'ignore_group_members' certainly helped with avoiding recursion, but it breaks nested group checking.
ldap_referrals = False This! .. from 2min delay to 1sec

From this link - increase the debug level of logging for sssd: To change the log level, set the debug_level parameter for each section in the sssd.conf file for which to produce extra logs. For example:

NOTE: run systemctl daemon-reload;systemctl restart sssd after changing this. You can also put this under the [sssd] header. I'd only do this temporarily for debug purposes.

## client system, from link above
## note, this might appear as [domain/FQDN]
[domain/LDAP]
cache_credentials = true
debug_level = 9

Then examine logs on a client system: ssh to one of the systems, do a tail -f of related sssd logs, messages and then have someone attempt to log in.

Elevate if necessary with Red Hat. The above might not help, is a bit of a stab.

Regards

RJ

I agree, i think RH is the next step. Thanks for all of your suggestions, i really appreciate the ideas. I'll update the post when i find a solution.

Hi Kevin and RJ,

Looks as if the recommendation I've posted in my first response was not that wrong - right ? :D

Regards,
Christian

Circling back around on this, adding the below line to sssd.conf has resolved the problem for us. You do lose the ability to nest group members, but if you're OK with that it solves the performance issues.

 ignore_group_members = True

Did the same here!!! Good one

I'm glad you got this resolved, thanks for posting this Kevin!

Regards

RJ

Hi Kevin,

I'm glad you've got it resolved, too ... thanks for sharing your solution. :)

Regards,
Christian

On the system that has lengthy delays, enter this command:

systemctl | grep  fail

Let us know

RJ

Hi Sakthidhasan,

This is the fourth time that you post something that doesn't contribute anything useful to this thread.
Posting "." and "Removed" doesn't make much sense - right ? Such postings can be considered spam.
Can you please avoid this ? Thank you ! :)

Regards,
Christian