RHEV host reports blocked multipathd tasks and switches in and out of an "Unassigned" state.
Issue
- Lost access to storage on a host.
- The host was reported as being in an "Unassigned" state in the RHEV Manager GUI.
- The 'engine logs' show the host repetitively going into an "Unassigned" state, then a "Non Operational" state, then back to "Unassigned";
2013-06-03 15:05:00,057 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-88) [de06f10] START, SetVdsStatusVDSCommand(HostName = Host-06, HostId = 8a5ffdfe-b8b8-11e2-b4a4-00155d01cd16, status=Unassigned, nonOperationalReason=NONE), log id: 70539b47
2013-06-03 15:08:16,937 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-53) [6315f7fe] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: 8a5ffdfe-b8b8-11e2-b4a4-00155d01cd16 Type: VDS
2013-06-03 15:10:00,125 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-38) [17ccb658] START, SetVdsStatusVDSCommand(HostName = Host-06, HostId = 8a5ffdfe-b8b8-11e2-b4a4-00155d01cd16, status=Unassigned, nonOperationalReason=NONE), log id: 2768d4ef
- The
vdsm logs
on the host show "Timed out during operation: cannot acquire state change lock" errors occurring in a variety of places;
Thread-398543::ERROR::2013-06-03 19:04:22,001::libvirtconnection::106::vds::(wrapper) Unknown libvirterror: ecode: 68 edom: 10
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1050, in migrateSetMaxDowntime
libvirtError: Timed out during operation: cannot acquire state change lock
Thread-44619::ERROR::2013-06-03 19:05:00,001::libvirtconnection::106::vds::(wrapper) Unknown libvirterror: ecode: 68 edom: 10
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1820, in blockStats
libvirtError: Timed out during operation: cannot acquire state change lock
Thread-44619::ERROR::2013-06-03 19:05:00,002::utils::416::vm.Vm::(collect) vmId=`f0974a38-89e8-4e41-a596-9a6c1acd46d6`::Stats function failed: <AdvancedStatsFunction _sampleDisk at 0x13682d0>
Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 412, in collect
File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 287, in __call__
File "/usr/share/vdsm/libvirtvm.py", line 147, in _sampleDisk
File "/usr/share/vdsm/libvirtvm.py", line 525, in f
TimeoutError: Timed out during operation: cannot acquire state change lock
- The messages files on the host show several
sanlock
messages;
Jun 3 18:58:23 localhost sanlock[15027]: 2013-06-03 18:58:23+0000 1539761 [8907]: s12 delta_renew read rv -202 offset 0 /dev/05be6f36-e294-4fe8-8fbb-669ca0d2a6e4/ids
Jun 3 18:58:23 localhost sanlock[15027]: 2013-06-03 18:58:23+0000 1539761 [8907]: s12 renewal error -202 delta_length 10 last_success 1539720
Jun 3 18:58:33 localhost sanlock[15027]: 2013-06-03 18:58:33+0000 1539771 [15027]: s11 check_our_lease warning 60 last_success 1539711
....... etc.,etc., .......
Jun 3 18:58:44 localhost sanlock[15027]: 2013-06-03 18:58:44+0000 1539782 [15027]: s10 check_our_lease warning 60 last_success 1539722
Jun 3 18:58:45 localhost wdmd[14963]: test warning now 1539783 ping 1539773 close 0 renewal 1539711 expire 1539791 client 15027 sanlock_99c38ee9-138a-4cf4-b9fe-08441b5dc3f4:3
Jun 3 18:58:45 localhost wdmd[14963]: /dev/watchdog closed unclean
Jun 3 18:58:45 localhost kernel: hpwdt: Unexpected close, not stopping watchdog!
Jun 3 19:00:25 localhost sanlock[15027]: 2013-06-03 19:00:25+0000 1539883 [8906]: s11 delta_renew write error -202
Jun 3 19:00:25 localhost sanlock[15027]: 2013-06-03 19:00:25+0000 1539883 [8906]: s11 renewal error -202 delta_length 141 last_success 1539711
Jun 3 19:00:34 localhost sanlock[15027]: 2013-06-03 19:00:34+0000 1539892 [8903]: ba5013d9 close_task_aio 0 0x7f82300008c0 busy
Jun 3 19:00:35 localhost sanlock[15027]: 2013-06-03 19:00:35+0000 1539893 [8906]: 99c38ee9 close_task_aio 0 0x7f822c0008c0 busy
Jun 3 19:00:35 localhost sanlock[15027]: 2013-06-03 19:00:35+0000 1539893 [8906]: 99c38ee9 close_task_aio 1 0x7f822c000910 busy
- Followed by several hung
multipathd
tasks;
Jun 3 19:01:42 localhost kernel: INFO: task multipathd:12223 blocked for more than 120 seconds.
Jun 3 19:01:42 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 3 19:01:42 localhost kernel: multipathd D 000000000000000c 0 12223 1 0x00000080
Jun 3 19:01:42 localhost kernel: ffff880924cd3968 0000000000000082 0000000000000000 ffff880c07482c90
Jun 3 19:01:42 localhost kernel: ffff880924cd38e8 ffffffffa040bc1e 0000000000000202 0000000000000046
Jun 3 19:01:42 localhost kernel: ffff880bf0b2baf8 ffff880924cd3fd8 000000000000fb88 ffff880bf0b2baf8
Jun 3 19:01:42 localhost kernel: Call Trace:
Jun 3 19:01:42 localhost kernel: [<ffffffffa040bc1e>] ? bfa_ioim_start+0x4e/0x60 [bfa]
Jun 3 19:01:42 localhost kernel: [<ffffffff8150ea05>] schedule_timeout+0x215/0x2e0
Jun 3 19:01:42 localhost kernel: [<ffffffff8108107d>] ? del_timer+0x7d/0xe0
Jun 3 19:01:42 localhost kernel: [<ffffffff8150e683>] wait_for_common+0x123/0x180
Jun 3 19:01:42 localhost kernel: [<ffffffff81063310>] ? default_wake_function+0x0/0x20
Jun 3 19:01:42 localhost kernel: [<ffffffff8150e79d>] wait_for_completion+0x1d/0x20
Jun 3 19:01:42 localhost kernel: [<ffffffff8126296c>] blk_execute_rq+0x8c/0xf0
Jun 3 19:01:42 localhost kernel: [<ffffffff8121b7e9>] ? security_capable+0x29/0x30
Jun 3 19:01:42 localhost kernel: [<ffffffff8107b9ba>] ? capable+0x2a/0x60
Jun 3 19:01:42 localhost kernel: [<ffffffff81266a9c>] ? blk_verify_command+0x2c/0x80
Jun 3 19:01:42 localhost kernel: [<ffffffff81266d05>] sg_io+0x215/0x3d0
Jun 3 19:01:42 localhost kernel: [<ffffffff81267670>] scsi_cmd_ioctl+0x400/0x470
Jun 3 19:01:42 localhost kernel: [<ffffffff810aa3ee>] ? futex_wake+0x10e/0x120
Jun 3 19:01:42 localhost kernel: [<ffffffff810ac350>] ? do_futex+0x100/0xb60
Jun 3 19:01:42 localhost kernel: [<ffffffff81267731>] scsi_cmd_blk_ioctl+0x51/0x70
Jun 3 19:01:42 localhost kernel: [<ffffffffa03a764f>] sd_ioctl+0xaf/0x110 [sd_mod]
Jun 3 19:01:42 localhost kernel: [<ffffffff81264487>] __blkdev_driver_ioctl+0x67/0x80
Jun 3 19:01:42 localhost kernel: [<ffffffff8126490d>] blkdev_ioctl+0x1ed/0x6e0
Jun 3 19:01:42 localhost kernel: [<ffffffff811bb7ec>] block_ioctl+0x3c/0x40
Jun 3 19:01:42 localhost kernel: [<ffffffff81194fb2>] vfs_ioctl+0x22/0xa0
Jun 3 19:01:42 localhost kernel: [<ffffffff81195154>] do_vfs_ioctl+0x84/0x580
Jun 3 19:01:42 localhost kernel: [<ffffffff811956d1>] sys_ioctl+0x81/0xa0
Jun 3 19:01:42 localhost kernel: [<ffffffff810dc645>] ? __audit_syscall_exit+0x265/0x290
Jun 3 19:01:42 localhost kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Environment
- Red Hat Enterprise Virtualization (RHEV) 3.1.1
- Red Hat Enterprise Hypervisor (RHEV-H) version 20130501.0.el6_4
Subscriber exclusive content
A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.