mod_jk's recover_time does not properly apply

Solution Verified - Updated -

Environment

  • JBoss Enterprise Web Server (EWS)
    • Apache httpd
    • mod_jk 1.2.32 and earlier

Issue

  • We've set our mod_jk recover_time larger than the default 60 seconds and worker.maintain is still the default 60 seconds.  But we see that the larger recover_time is only properly used the first recovery attempt.  Subsequent recovery attempts instead appear to happen every ~60 seconds.

Resolution

  • Await a fix in a future mod_jk release upon resolution of this bug
  • As a workaround, you can increase worker.maintain along with your recover_time.  A worker would only be able to enter recovery as frequently as the worker maintenance method runs.

Root Cause

  • https://issues.apache.org/bugzilla/show_bug.cgi?id=52334
  • The last error time (LE on jkstatus page) does not update past the initial error. The relevant code is from

    if (w->s->state == JK_LB_STATE_ERROR) {
        elapsed = (int)difftime(now, w->s->error_time);
        if (elapsed <= p->recover_wait_time) {
            if (JK_IS_DEBUG_LEVEL(l))
                jk_log(l, JK_LOG_DEBUG,
                       "worker %s will recover in %d seconds",
                       w->name, p->recover_wait_time - elapsed);
        }
        else {
            if (JK_IS_DEBUG_LEVEL(l))
                jk_log(l, JK_LOG_DEBUG,
                       "worker %s is marked for recovery",
                       w->name);
            if (p->lbmethod != JK_LB_METHOD_BUSYNESS)
                w->s->lb_value = curmax;
            aw->s->reply_timeouts = 0;
            w->s->state = JK_LB_STATE_RECOVER;
            non_error++;
        }
    }
    

    So after the first pass, error_time never updates again and so elapsed is always > recover_wait_time, resulting in the worker being placed in recovery on each call of the worker maintenance.

Diagnostic Steps

  • Check the LE (last error time) on the jkstatus page.  Does this remain the same across all the recovery attempts?
  • Change your worker.maintain.  Do you see that repeat recovery attempts match this change?

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments