Application or database performance is degraded and strace cites "EAGAIN (Resource temporarily unavailable)"

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux

Issue

  • The application or database performance is degraded.
  • When using strace -rttTf -p <PIDNUM>, the output contains notices that reference the error EAGAIN (Resource temporarily unavailable).

Resolution

  • Please engage the respective application or database vendor for further recommendations on alleviating the semaphore contention.

Root Cause

  • Often EAGAIN is used as an error code for resource contention. However, EAGAIN does not definitively mean that there is contention as a result of the Operating System.

  • The definition of EAGAIN is provided by the Man page:

    EAGAIN Resource temporarily unavailable (may be the same value as EWOULDBLOCK) (POSIX.1-2001).
    
  • These notices may indicate that there is contention within the workload, or synchronization issues, as it vies for protected resources. In this case, further investigation would need to be performed by the vendor of the application.

  • An example of this is multiple application threads vying for the same semaphore-protected resource. If the semaphore is contented and a thread waits for the timeout period, the thread may attempt the semaphore again and receive the EAGAIN notice.

Diagnostic Steps

  1. When tracing system calls using strace, notices indicating a that semtimedop fails, citing errno -1 EAGAIN (Resource temporarily unavailable).

    123456 16:05:32.480177 (+     0.000122) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=10000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.009853>
    123456 16:05:32.490121 (+     0.009950) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=20000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.019919>
    123456 16:05:32.510130 (+     0.020007) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=30000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.029841>
    123456 16:05:32.540094 (+     0.029965) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=40000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.039939>
    123456 16:05:32.580152 (+     0.040059) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=50000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.049796>
    123456 16:05:32.630044 (+     0.049889) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=60000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.059982>
    123456 16:05:32.690148 (+     0.060109) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=70000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.069917>
    123456 16:05:32.760191 (+     0.070040) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=80000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.079824>
    123456 16:05:32.840150 (+     0.079963) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=90000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.089919>
    123456 16:05:32.930212 (+     0.090058) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=100000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.099862>
    123456 16:05:33.030229 (+     0.100020) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=110000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.109725>
    123456 16:05:33.140112 (+     0.109878) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=120000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.119988>
    123456 16:05:33.260242 (+     0.120137) semtimedop(32778, [{145, -1, 0}], 1, {tv_sec=0, tv_nsec=130000000}) = -1 EAGAIN (Resource temporarily unavailable) <0.129836>
    
  2. From the semtimedop() definition, we can see that the EAGAIN errno is produced when the time threshold waiting for a semaphore has been reached.

    SEMOP(2)                          Linux Programmer's Manual                          SEMOP(2)
    
    NAME
        semop, semtimedop - System V semaphore operations
    
    SYNOPSIS
        #include <sys/types.h>
        #include <sys/ipc.h>
        #include <sys/sem.h>
    
    
    semtimedop()
        semtimedop() behaves identically to semop() except that in those cases where the calling thread would sleep, 
        the duration of  that  sleep is  limited  by  the  amount  of elapsed time specified by the timespec structure
        whose address is passed in the timeout argument.  (This sleep interval will be rounded up to the system clock 
        granularity, and kernel scheduling delays mean that the interval may overrun by a small amount.) If the specified 
        time limit has been reached, semtimedop() fails with errno set to EAGAIN (and none of the operations in sops is 
        performed). If the timeout argument is NULL, then semtimedop() behaves exactly like semop().
    

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments