Why is rhevm-upgrade unable to automatically clear running tasks after receiving "The following tasks have been found running in the system" error?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Virtualization (RHEV) 3.1

Issue

  • The following error was observed after running the rhevm-upgrade command to go from 3.1.2 to 3.1.3:
# rhevm-upgrade
Loaded plugins: product-id, rhnplugin
This system is receiving updates from RHN Classic or RHN Satellite.

Checking for updates... (This may take several minutes)
10 Updates available:
 * rhevm-3.1.0-50.el6ev.noarch
 * rhevm-backend-3.1.0-50.el6ev.noarch
 * rhevm-config-3.1.0-50.el6ev.noarch
 * rhevm-dbscripts-3.1.0-50.el6ev.noarch
 * rhevm-genericapi-3.1.0-50.el6ev.noarch
 * rhevm-notification-service-3.1.0-50.el6ev.noarch
 * rhevm-restapi-3.1.0-50.el6ev.noarch
 * rhevm-tools-common-3.1.0-50.el6ev.noarch
 * rhevm-userportal-3.1.0-50.el6ev.noarch
 * rhevm-webadmin-portal-3.1.0-50.el6ev.noarch

During the upgrade process, RHEV Manager  will not be accessible.
All existing running virtual machines will continue but you will not be able to
start or stop any new virtual machines during the process.

Would you like to proceed? (yes|no): yes
Stopping ovirt-engine service...                         [ DONE ]
Stopping DB related services...                          [ DONE ]
Cleaning async tasks...                                  [ DONE ]

Info: The following tasks have been found running in the system:

System Tasks:

                   command_type                    |                       entity_type
---------------------------------------------------+----------------------------------------------------------
 org.ovirt.engine.core.bll.RemoveVmTemplateCommand | org.ovirt.engine.core.common.businessentities.VmTemplate
(1 row)




[ Mar 19 13:28:07 ] Would you like to proceed and try to stop tasks automatically?
(Answering 'no' will stop the upgrade)? (yes|no): yes
  • Answering yes causes the upgrade script to try to kill the zombie task. If it fails to do this, the user must answer no and the upgrade fails. Is it possible to fix this?

Resolution

  • The upgrade command was unable to clear tasks related to "businessentities.VmTemplate" most likely because there were no associated tasks in the system. Somehow this business entity became orphaned, so it needs to be removed manually.
  1. Locate the orphaned entities:

    # PGPASSFILE=/etc/ovirt-engine/.pgpass psql engine engine
    
    engine=> select * from business_entity_snapshot\x\g\x
    Expanded display is on.
    -[ RECORD 1 ]---+------------------------------------------------------------------------------------------
    id              | 42b891da-882c-11e2-8b50-005056af002f
    command_id      | 015871dc-7080-4bc3-8339-4f385b88683f
    command_type    | org.ovirt.engine.core.bll.RemoveVmTemplateCommand
    entity_id       | 85ccaafa-c224-4e18-b765-a5f60a68c432
    entity_type     | org.ovirt.engine.core.common.businessentities.VmTemplate
    entity_snapshot | {
                    :   "id" : [ "org.ovirt.engine.core.compat.Guid", {
                    :     "uuid" : "85ccaafa-c224-4e18-b765-a5f60a68c432"
                    :   } ],
                    :   "status" : [ "org.ovirt.engine.core.common.businessentities.VmTemplateStatus", "OK" ]
                    : }
    snapshot_class  | org.ovirt.engine.core.common.businessentities.BusinessEntitySnapshot$EntityStatusSnapshot
    snapshot_type   | 2
    insertion_order | 1
    started_at      | 2013-03-08 15:10:45.382928-05
    
  2. Stop the ovirt-engine service with the following command:

    # service ovirt-engine stop
    
  3. Run the following command to clear the task from the database:

    PGPASSFILE=/etc/ovirt-engine/.pgpass psql engine engine -c "delete from business_entity_snapshot where id = '42b891da-882c-11e2-8b50-005056af002f';"
    
  4. Start the service with the following command:

    # service ovirt-engine start
    

Note : For RHEV-M 3.1.4 and above, /usr/share/ovirt-engine/scripts/taskcleaner/taskcleaner.sh utility can be used for clearing the tasks using task_id or command_id

Diagnostic Steps

  • Checking the ovirt-engine-upgrade log:
# cat /var/log/ovirt-engine/ovirt-engine-upgrade_2013_03_19_13_41_49.log
---
2013-03-19 13:43:06::DEBUG::common_utils::339::root:: Executing command --> '/usr/share/ovirt-engine/scripts/taskcleaner/taskcleaner.sh -u engine -s localhost -p 5432 -d engine -z'
2013-03-19 13:43:06::DEBUG::common_utils::377::root:: output = (No rows)

2013-03-19 13:43:06::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:43:06::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:43:06::DEBUG::common_utils::339::root:: Executing command --> '/usr/share/ovirt-engine/scripts/taskcleaner/taskcleaner.sh -u engine -s localhost -p 5432 -d engine -z -R -C -J -q'
2013-03-19 13:43:06::DEBUG::common_utils::377::root:: output = -[ RECORD 1 ]-----------+-
deleteasynctaskszombies |


2013-03-19 13:43:06::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:43:06::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:43:06::DEBUG::rhevm-upgrade::1080::root:: Checking active system tasks
2013-03-19 13:43:06::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/psql -q -P tuples_only=on -P format=unaligned -h localhost -p 5432 -U engine -d engine -c select count(action_type) from async_tasks;'
2013-03-19 13:43:06::DEBUG::common_utils::377::root:: output = 0

2013-03-19 13:43:06::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:43:06::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:43:06::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/psql -q -P tuples_only=on -P format=unaligned -h localhost -p 5432 -U engine -d engine -c select count(*) from business_entity_snapshot;'
2013-03-19 13:43:06::DEBUG::common_utils::377::root:: output = 1

2013-03-19 13:43:06::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:43:06::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:43:06::DEBUG::common_utils::398::root:: running sql query 'select command_type, entity_type from business_entity_snapshot;' on db server: 'localhost'.
2013-03-19 13:43:06::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/psql -h localhost -p 5432 -U engine -d engine -c select command_type, entity_type from business_entity_snapshot;'
2013-03-19 13:43:06::DEBUG::common_utils::377::root:: output =                    command_type                    |                       entity_type                      
---------------------------------------------------+----------------------------------------------------------
 org.ovirt.engine.core.bll.RemoveVmTemplateCommand | org.ovirt.engine.core.common.businessentities.VmTemplate
(1 row)


2013-03-19 13:43:06::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:43:06::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:43:06::DEBUG::common_utils::909::root:: asking user:
Info: The following tasks have been found running in the system:

System Tasks:

                   command_type                    |                       entity_type
---------------------------------------------------+----------------------------------------------------------
 org.ovirt.engine.core.bll.RemoveVmTemplateCommand | org.ovirt.engine.core.common.businessentities.VmTemplate
(1 row)




[ Mar 19 13:43:06 ] Would you like to proceed and try to stop tasks automatically?
(Answering 'no' will stop the upgrade)? (yes|no):
2013-03-19 13:46:25::DEBUG::common_utils::913::root:: user answered: yes
2013-03-19 13:46:25::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/engine-config -g AsyncTaskZombieTaskLifeInMinutes'
2013-03-19 13:46:27::DEBUG::common_utils::377::root:: output = AsyncTaskZombieTaskLifeInMinutes: 3000 version: general

2013-03-19 13:46:27::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:46:27::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:46:27::DEBUG::common_utils::828::root:: updating vdc option AsyncTaskZombieTaskLifeInMinutes to: 0
2013-03-19 13:46:27::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/engine-config -s AsyncTaskZombieTaskLifeInMinutes=0 --cver=general -p /usr/share/ovirt-engine/conf/engine-config-install.properties'
2013-03-19 13:46:28::DEBUG::common_utils::377::root:: output =
2013-03-19 13:46:28::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:46:28::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:46:28::DEBUG::common_utils::828::root:: updating vdc option EngineMode to: MAINTENANCE
2013-03-19 13:46:28::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/engine-config -s EngineMode=MAINTENANCE --cver=general -p /usr/share/ovirt-engine/conf/engine-config-install.properties'
2013-03-19 13:46:28::DEBUG::common_utils::377::root:: output =
2013-03-19 13:46:28::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:46:28::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:46:28::DEBUG::rhevm-upgrade::651::root:: starting ovirt-engine service.
2013-03-19 13:46:28::DEBUG::common_utils::339::root:: Executing command --> '/sbin/service ovirt-engine start'
2013-03-19 13:46:29::DEBUG::common_utils::377::root:: output = Starting engine-service:^[[60G [  ^[[0;32mOK^[[0;39m  ]

2013-03-19 13:46:29::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:46:29::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:49:29::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/psql -q -P tuples_only=on -P format=unaligned -h localhost -p 5432 -U engine -d engine -c select count(action_type) from async_tasks;'
2013-03-19 13:49:29::DEBUG::common_utils::377::root:: output = 0

2013-03-19 13:49:29::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:49:29::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:49:29::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/psql -q -P tuples_only=on -P format=unaligned -h localhost -p 5432 -U engine -d engine -c select count(*) from business_entity_snapshot;'
2013-03-19 13:49:29::DEBUG::common_utils::377::root:: output = 1

2013-03-19 13:49:29::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:49:29::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:49:29::DEBUG::common_utils::398::root:: running sql query 'select command_type, entity_type from business_entity_snapshot;' on db server: 'localhost'.
2013-03-19 13:49:29::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/psql -h localhost -p 5432 -U engine -d engine -c select command_type, entity_type from business_entity_snapshot;'
2013-03-19 13:49:29::DEBUG::common_utils::377::root:: output =                    command_type                    |                       entity_type                      
---------------------------------------------------+----------------------------------------------------------
 org.ovirt.engine.core.bll.RemoveVmTemplateCommand | org.ovirt.engine.core.common.businessentities.VmTemplate
(1 row)


2013-03-19 13:49:29::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:49:29::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:49:29::DEBUG::rhevm-upgrade::1137::root:: Still waiting for system tasks to be cleared.
2013-03-19 13:49:29::DEBUG::common_utils::909::root:: asking user:
Info: The following tasks have been found running in the system:

System Tasks:

                   command_type                    |                       entity_type
---------------------------------------------------+----------------------------------------------------------
 org.ovirt.engine.core.bll.RemoveVmTemplateCommand | org.ovirt.engine.core.common.businessentities.VmTemplate
(1 row)




[ Mar 19 13:49:29 ] Would you like to proceed and try to stop tasks automatically?
(Answering 'no' will stop the upgrade)? (yes|no):
2013-03-19 13:50:09::DEBUG::common_utils::913::root:: user answered: yes
2013-03-19 13:50:09::DEBUG::rhevm-upgrade::1156::root:: Retrying to clear system tasks. System will try to clear tasks during the next 3 minutes.

[...]
[ Mar 19 13:53:09 ] Would you like to proceed and try to stop tasks automatically?
(Answering 'no' will stop the upgrade)? (yes|no):
2013-03-19 13:53:30::DEBUG::common_utils::913::root:: user answered: no
[...]
Please make sure that there are no running system tasks before you continue. Please contact GSS for assistance. Stopping upgrade.
[...]
2013-03-19 13:53:32::DEBUG::rhevm-upgrade::1171::root:: Restoring engine from maintenance mode
2013-03-19 13:53:32::DEBUG::common_utils::828::root:: updating vdc option EngineMode to: ACTIVE
2013-03-19 13:53:32::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/engine-config -s EngineMode=ACTIVE --cver=general -p /usr/share/ovirt-engine/conf/engine-config-install.properties'
2013-03-19 13:53:33::DEBUG::common_utils::377::root:: output =
2013-03-19 13:53:33::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:53:33::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:53:33::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/engine-config -g AsyncTaskZombieTaskLifeInMinutes'
2013-03-19 13:53:33::DEBUG::common_utils::377::root:: output = AsyncTaskZombieTaskLifeInMinutes: 0 version: general

2013-03-19 13:53:33::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:53:33::DEBUG::common_utils::379::root:: retcode = 0
2013-03-19 13:53:33::DEBUG::common_utils::828::root:: updating vdc option AsyncTaskZombieTaskLifeInMinutes to: 3000
2013-03-19 13:53:33::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/engine-config -s AsyncTaskZombieTaskLifeInMinutes=3000 --cver=general -p /usr/share/ovirt-engine/conf/engine-config-install.properties'
2013-03-19 13:53:34::DEBUG::common_utils::377::root:: output =
2013-03-19 13:53:34::DEBUG::common_utils::378::root:: stderr =
2013-03-19 13:53:34::DEBUG::common_utils::379::root:: retcode = 0
  • We see that rhevm-upgrade tries to clear any async tasks that could be zombies with /usr/share/ovirt-engine/scripts/taskcleaner/taskcleaner.sh script. It finds 1 entry when run with a certain set of flags, but I don't currently know exactly what this script does. It then checks the business_entity_snapshot table and finds an entry, but there doesn't seem to be a matching task for this. It attempts to stop the task by setting the zombie task timeout (AsyncTaskZombieTaskLifeInMinutes) to 0, forcing all tasks to stop, but since there's no actual task nothing seems to happen.

  • Based on the lines from the above output:

"2013-03-19 13:49:29::DEBUG::common_utils::339::root:: Executing command --> '/usr/bin/psql -q -P tuples_only=on -P format=unaligned -h localhost -p 5432 -U engine -d engine -c select count(*) from business_entity_snapshot;'
2013-03-19 13:49:29::DEBUG::common_utils::377::root:: output = 1"
  • It provides a SQL command that is used to check the problem table:
engine=> select * from business_entity_snapshot\x\g\x
Expanded display is on.
-[ RECORD 1 ]---+------------------------------------------------------------------------------------------
id              | 42b891da-882c-11e2-8b50-005056af002f
command_id      | 015871dc-7080-4bc3-8339-4f385b88683f
command_type    | org.ovirt.engine.core.bll.RemoveVmTemplateCommand
entity_id       | 85ccaafa-c224-4e18-b765-a5f60a68c432
entity_type     | org.ovirt.engine.core.common.businessentities.VmTemplate
entity_snapshot | {
                :   "id" : [ "org.ovirt.engine.core.compat.Guid", {
                :     "uuid" : "85ccaafa-c224-4e18-b765-a5f60a68c432"
                :   } ],
                :   "status" : [ "org.ovirt.engine.core.common.businessentities.VmTemplateStatus", "OK" ]
                : }
snapshot_class  | org.ovirt.engine.core.common.businessentities.BusinessEntitySnapshot$EntityStatusSnapshot
snapshot_type   | 2
insertion_order | 1
started_at      | 2013-03-08 15:10:45.382928-05

Expanded display is off.
  • Discovering this entity's ID, another SQL command can be used to drop this row:
  delete from business_entity_snapshot where id = '42b891da-882c-11e2-8b50-005056af002f';

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments