The first sudo attempt takes a long time when using winbind.

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux 6
  • sudo-1.8.6p3-12.el6

Issue

  • When we want to do "sudo su -" it takes a long time (almost 5 minutes)
  • After that and once you are root if you logout and do "sudo su -" again it does not take that much.
  • This is happening with these servers only, we have a few seconds of delay in some of our servers but these ones are taking a lot of time.
  • Issue is more visible if Red Hat Enterprise Linux systems are part of a Active Directory forest with many trusted domains.

Resolution

Issue is currently being tracked in the private bug, contact Red Hat technical support for further information. Current work around is to downgrade the sudo package to the sudo-1.7.4p5-13.el6_3.x86_64.rpm

Root Cause

The issue is because of a getgrgid function that was added to sudo in version 1.8 which enumerates all of the groups that a person is a member of, even if the group isn't listed in the sudoers files. It is common for Active Directory to have a forest of trusted domains and for users to be members of groups in several of the domains.

    /*
     * Resolve and store group names by ID.
     */
    ngroups = 0;
    for (i = 0; i < ngids; i++) {
    if ((grp = sudo_getgrgid(gids[i])) != NULL) {
        len = strlen(grp->gr_name) + 1;
        if (cp - (char *)grlitem + len > total) {
        total += len + GROUPNAME_LEN;
        efree(grlitem);
        sudo_gr_delref(grp);
        goto again;
        }
        memcpy(cp, grp->gr_name, len);
        grlist->groups[ngroups++] = cp;
        cp += len;
        sudo_gr_delref(grp);
    }
    }
    grlist->ngroups = ngroups;

Diagnostic Steps

  • The /etc/nsswitch.conf has the following but there is no sudoers line.
passwd:     files winbind
shadow:     files winbind
group:      files winbind
  • The following command shows that there were several trusted domains.
[root@server ~]# wbinfo --online-status
BUILTIN : online
DOMAIN : online
TRUST1 : online
TRUST2 : online
...
  • Listing the users/groups in two domains show that there are significantly more groups in the first domain. Group lookups are more intensive than user lookups and this may be the cause of the lengthy "sudo su -" command.
 wbinfo -u --domain DOMAIN | wc -l
6365

wbinfo -g --domain DOMAIN | wc -l
53935

 wbinfo -u --domain TRUST1 | wc -l
5958

 wbinfo -g --domain TRUST1 | wc -l
693
  • strace -fvtTo /tmp/strace.out -s 1024 { command}
    -T -- print time spent in each syscall
$ strace -fvtTo /tmp/strace-su.out -s 1024 sudo su -
  • The following command will show the syscalls that took the most time. These times can be searched for in the strace-su.out file to find the syscall.
$ cat strace-su.out | sed 's/<//g' | sed 's/>//g' | awk '{print $NF}' | sort -n
...
0.211371
7.146669
8.085020
8.175176
  • This command will show how long the "sudo su -" command took to run.
$ awk '{s+=$1} END {print s}' strace-su.out
28.5038
[2013/11/21 10:06:59.814816,  1, pid=5645] ../librpc/ndr/ndr.c:284(ndr_print_function_debug)
       wbint_LookupSid: struct wbint_LookupSid
          in: struct wbint_LookupSid
              sid                      : *
                  sid                      : S-1-5-21-1614895754-920026266-1417001333-66024
[2013/11/21 10:06:59.819613,  1, pid=5645] ../librpc/ndr/ndr.c:284(ndr_print_function_debug)
       wbint_LookupSid: struct wbint_LookupSid
          out: struct wbint_LookupSid
              type                     : *
                  type                     : SID_NAME_DOM_GRP (2)
              domain                   : *
                  domain                   : *
                      domain                   : 'NA'
              name                     : *
                  name                     : *
                      name                     : 'S-U-SCCM-Citrix'
              result                   : NT_STATUS_OK
[2013/11/21 10:06:59.819777, 10, pid=5645] winbindd/wb_sid2gid.c:57(wb_sid2gid_send)
  idmap_cache_find_sid2gid found 428933608
[2013/11/21 10:06:59.819817,  1, pid=5645] ../librpc/ndr/ndr.c:284(ndr_print_function_debug)
       wbint_LookupGroupMembers: struct wbint_LookupGroupMembers
          in: struct wbint_LookupGroupMembers
              sid                      : *
                  sid                      : S-1-5-21-1614895754-920026266-1417001333-66024
              type                     : SID_NAME_DOM_GRP (2)
[2013/11/21 10:06:59.819904, 10, pid=5645] winbindd/winbindd_dual.c:1365(fork_domain_child)
  fork_domain_child called for domain 'NA'
[2013/11/21 10:06:59.820284, 10, pid=5785] winbindd/winbindd_dual.c:1421(fork_domain_child)
  Child process 5785
[2013/11/21 10:07:13.157621,  1, pid=5645] ../librpc/ndr/ndr.c:284(ndr_print_function_debug)
       wbint_LookupGroupMembers: struct wbint_LookupGroupMembers
          out: struct wbint_LookupGroupMembers

Looking at the last two debug lines here, 10:06:59 -> 10:07:13 is the time it took to query the NA domain DC in order to gather the members of the NA\S-U-SCCM-Citrix group. Somewhere in the stack they may be requiring the membership of this group, in which it just has a slow connection to the NA DC.
  • Collect the environment information such as how large Active directory environment is ? How many trusted domains are there?
  • How many groups are present in the Active directory?

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments