Winbindd is continously taking 100% CPU

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux 5.5 and later.

Issue

  • Winbindd is continously taking 100% CPU. After restarting winbind service, it works fine for sometime but again CPU usage goes very high.

Resolution

This issue has been resolved via an errata update, the fix is available in samba3x package versions 3.6.6-0.136.el5 and later.

Root Cause

winbind uses libtevent which used to call the poll() function. libtevent has been updated to call epoll() which fixes this issue.

Diagnostic Steps

When did winbind start running at 100%?
(After an upgrade to a new version?)
* Steps to reproduce the behaviour?
* How much time does it takes to reproduce the behaviour?
* When the daemon is running at 100%, can it still do it's normal tasks?
* Does winbind recover from the problem and run normally after some time?
* How many connections does the daemon have to handle?
(How many users are trying to login simultaneously?)

  • Is there automation involved hammering on the daemon?
    (Is there a script which logs in and performs a task?)

Debugging information required:

  1. winbind debug log with log level 10. To set this up:
    Edit /etc/samba/smb.conf and set the following variables in the in the
    [general] section of the config:
     debug level = 10
     debug pid = true
     max log size = 0
  1. Attach strace to the daemon and collect strace logs for about 1 minute.
    Use the command: strace -ffxvto winbind.strace -p <winbind.pid>
    Attach all resulting winbind.strace.* logs to the case as a zipped tarball.

  2. Attach gdb to the running process and collect a backtrace. A prerequisite for getting a good backtrace is to install the relevant debuginfo packages for winbind. The following knowledge base article provides all the information you need to install debuginfo packages:

https://access.redhat.com/knowledge/solutions/9907

After installing debuginfo, use the command:

gdb -p <winbind.pid> `which winbindd` 

to start gdb.

At the gdb prompt, run the command:
thr a a bt

Put the output in a log file and attach it to the case.

  1. lsof -p output when the winbind process is running at 100% cpu. Run the following command:
    lsof -p <winbindd.pid>

Strace results show:

26938 05:01:36 poll([{fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 3, -89477551) = 1 ([{fd=33, revents=POLLIN|POLLERR|POLLHUP}]) <0.000005>
26938 05:01:36 poll([{fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 3, -89477551) = 1 ([{fd=33, revents=POLLIN|POLLERR|POLLHUP}]) <0.000005>
26938 05:01:36 poll([{fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 3, -89477551) = 1 ([{fd=33, revents=POLLIN|POLLERR|POLLHUP}]) <0.000005>
26938 05:01:36 poll([{fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 3, -89477551) = 1 ([{fd=33, revents=POLLIN|POLLERR|POLLHUP}]) <0.000005>
26938 05:01:36 poll([{fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 3, -89477551) = 1 ([{fd=33, revents=POLLIN|POLLERR|POLLHUP}]) <0.000005>
26938 05:01:36 poll([{fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 3, -89477551) = 1 ([{fd=33, revents=POLLIN|POLLERR|POLLHUP}]) <0.000005>
26938 05:01:36 poll([{fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 3, -89477551) = 1 ([{fd=33, revents=POLLIN|POLLERR|POLLHUP}]) <0.000005>
26938 05:01:36 poll([{fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 3, -89477551) = 1 ([{fd=33, revents=POLLIN|POLLERR|POLLHUP}]) <0.000005>
26938 05:01:36 poll([{fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 3, -89477551) = 1 ([{fd=33, revents=POLLIN|POLLERR|POLLHUP}]) <0.000005>
26938 05:01:36 poll([{fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 3, -89477551) = 1 ([{fd=33, revents=POLLIN|POLLERR|POLLHUP}]) <0.000005>
26938 05:01:36 poll([{fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 3, -89477551) = 1 ([{fd=33, revents=POLLIN|POLLERR|POLLHUP}]) <0.000005>
26938 05:01:36 poll([{fd=31, events=POLLIN}, {fd=32, events=POLLIN}, {fd=33, events=POLLIN}], 3, -89477551) = 1 ([{fd=33, revents=POLLIN|POLLERR|POLLHUP}]) <0.000005>

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.