Various openstack commands fail due to timeout while connecting to keystone for authentication

Solution Verified - Updated -

Red Hat Insights can detect this issue

Proactively detect and remediate issues impacting your systems.
View matching systems and remediation

Environment

  • Red Hat Enterprise Linux Openstack Platform

Issue

  • Various openstack commands fail due to timeout while connecting to keystone for authentication. Eg
    • A stack remains in the DELETE_IN_PROGRESS state after an attempt is made to delete it. The following trace seen in /var/log/heat/heat-engine.log:
2014-06-17 16:04:39.529 5069 ERROR heat.engine.resource [-] Delete WaitConditionHandle "monsrv_wait_handle"
2014-06-17 16:04:39.529 5069 TRACE heat.engine.resource Traceback (most recent call last):
2014-06-17 16:04:39.529 5069 TRACE heat.engine.resource   File "/usr/lib/python2.6/site-packages/heat/engine/resource.py", line 565, in delete
2014-06-17 16:04:39.529 5069 TRACE heat.engine.resource     handle_data = self.handle_delete()
2014-06-17 16:04:39.529 5069 TRACE heat.engine.resource   File "/usr/lib/python2.6/site-packages/heat/engine/signal_responder.py", line 61, in handle_delete
2014-06-17 16:04:39.529 5069 TRACE heat.engine.resource     self.keystone().delete_stack_user(self.resource_id)
2014-06-17 16:04:39.529 5069 TRACE heat.engine.resource   File "/usr/lib/python2.6/site-packages/heat/common/heat_keystoneclient.py", line 281, in delete_stack_user
2014-06-17 16:04:39.529 5069 TRACE heat.engine.resource     raise exception.Error(reason)
2014-06-17 16:04:39.529 5069 TRACE heat.engine.resource Error: Timed out trying to delete user
2014-06-17 16:04:39.529 5069 TRACE heat.engine.resource

Resolution

Purge expired keystone tokens periodically.

The following command can be used to manually flush the expired tokens from the keystone database anytime.

# keystone-manage token_flush

It is recommended that administrators periodically run this command via a cron job once in a day from all controllers to ensure this table does not grow exceedingly large. Steps to create the cron job will be different depending upon how openstack is deployed.

Single Controller Non HA Deployment

If the environment is deployed using packstack or with Director with just one controller, a cron job as below will be enough.

# crontab -e -u keystone
01 * * * * /usr/bin/keystone-manage token_flush >/dev/null 2>&1

High Availability Deployment

A highly available deployment should have minimum three controller/keystone nodes. It's recommended to schedule the cron job to be run from each controller with a staggered delay. We recommend to run it from each controller to make sure that the job will remain running even if one controller goes down. Eg,

Controller-1:

# crontab -e -u keystone
01 01 * * * /usr/bin/keystone-manage token_flush >/dev/null 2>&1

Controller-2:

# crontab -e -u keystone
01 08 * * * /usr/bin/keystone-manage token_flush >/dev/null 2>&1

Controler-3:

# crontab -e -u keystone
01 16 * * * /usr/bin/keystone-manage token_flush >/dev/null 2>&1
  • RHEL-OSP6 Installer configures this cron job to run every minute from each controller. You can either keep that configuration or change to above.
  • At the time of this writing, RHEL-OSP Director does not configure this cron job for OSP7 deployments. So this should be configured manually after deployment.
  • If there are too many expired keystone tokens, it can increase the size of the files in /var/lib/mysql/*.
  • The number of expired tokens can significantly increase if there are too many authentication requests to keystone. If you find the table quickly growing up, it's recommended to schedule the cron job to be run more frequently than recommended above. Like twice in a day, once in an hour, etc depending upon your environment. Make sure that this is run from each controller after some delay. Eg,

If you decide that this need to be run once in a hour, configure to run it with a delay of 20 minutes from each controller as below.

Controller-1: At 1st minute of every hour: 01 * * * * /usr/bin/keystone-manage token_flush >/dev/null 2>&1
Controller-2: At 21st minute of every hour: 21 * * * * /usr/bin/keystone-manage token_flush >/dev/null 2>&1
Controller-3: at 41st minute of every hour.: 41 * * * * /usr/bin/keystone-manage token_flush >/dev/null 2>&1

Root Cause

  • Commands can timeout due to a large number of expired of user tokens slowing down the execution time to query the keystone database.

Diagnostic Steps

  • Use the following query to review how many expired tokens are present in the keystone database:
root@controller # mysql
mysql> use keystone;
mysql> select count(*) from token where token.expires < CURTIME();

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

2 Comments

According to everything I could find, it appears to me as though OpenStack itself is going down the path of simply executing the token_flush on every controller node every minute.

See: https://review.openstack.org/#/c/142420/ which appears to indicate the abandonment of a staggered flush strategy (or a load-based one). Seems like they have resolved to simply flush every minute on every controller:

# Puppet Name: token-flush
*/1 * * * * /usr/bin/keystone-manage token_flush >/dev/null 2>&1

Implemented by Director Automatically (Openstack Version Kilo)

# HEADER: This file was autogenerated at 2016-03-28 19:57:44 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: keystone-manage token_flush

#
PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh
1 0 * * * sleep `expr ${RANDOM} \% 3600`; keystone-manage token_flush >>/var/log/keystone/keystone-tokenflush.log 2>&1