RHEV host reports blocked multipathd tasks and switches in and out of an "Unassigned" state.

Solution Unverified - Updated -

Issue

  • Lost access to storage on a host.
  • The host was reported as being in an "Unassigned" state in the RHEV Manager GUI.
  • The 'engine logs' show the host repetitively going into an "Unassigned" state, then a "Non Operational" state, then back to "Unassigned";
2013-06-03 15:05:00,057 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-88) [de06f10] START, SetVdsStatusVDSCommand(HostName = Host-06, HostId = 8a5ffdfe-b8b8-11e2-b4a4-00155d01cd16, status=Unassigned, nonOperationalReason=NONE), log id: 70539b47

2013-06-03 15:08:16,937 INFO  [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-53) [6315f7fe] Running command: SetNonOperationalVdsCommand internal: true. Entities affected :  ID: 8a5ffdfe-b8b8-11e2-b4a4-00155d01cd16 Type: VDS

2013-06-03 15:10:00,125 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (QuartzScheduler_Worker-38) [17ccb658] START, SetVdsStatusVDSCommand(HostName = Host-06, HostId = 8a5ffdfe-b8b8-11e2-b4a4-00155d01cd16, status=Unassigned, nonOperationalReason=NONE), log id: 2768d4ef
  • The vdsm logs on the host show "Timed out during operation: cannot acquire state change lock" errors occurring in a variety of places;
Thread-398543::ERROR::2013-06-03 19:04:22,001::libvirtconnection::106::vds::(wrapper) Unknown libvirterror: ecode: 68 edom: 10
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1050, in migrateSetMaxDowntime
libvirtError: Timed out during operation: cannot acquire state change lock

Thread-44619::ERROR::2013-06-03 19:05:00,001::libvirtconnection::106::vds::(wrapper) Unknown libvirterror: ecode: 68 edom: 10
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 83, in wrapper
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1820, in blockStats
libvirtError: Timed out during operation: cannot acquire state change lock
Thread-44619::ERROR::2013-06-03 19:05:00,002::utils::416::vm.Vm::(collect) vmId=`f0974a38-89e8-4e41-a596-9a6c1acd46d6`::Stats function failed: <AdvancedStatsFunction _sampleDisk at 0x13682d0>
Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 412, in collect
  File "/usr/lib64/python2.6/site-packages/vdsm/utils.py", line 287, in __call__
  File "/usr/share/vdsm/libvirtvm.py", line 147, in _sampleDisk
  File "/usr/share/vdsm/libvirtvm.py", line 525, in f
TimeoutError: Timed out during operation: cannot acquire state change lock
  • The messages files on the host show several sanlock messages;
Jun  3 18:58:23 localhost sanlock[15027]: 2013-06-03 18:58:23+0000 1539761 [8907]: s12 delta_renew read rv -202 offset 0 /dev/05be6f36-e294-4fe8-8fbb-669ca0d2a6e4/ids
Jun  3 18:58:23 localhost sanlock[15027]: 2013-06-03 18:58:23+0000 1539761 [8907]: s12 renewal error -202 delta_length 10 last_success 1539720

Jun  3 18:58:33 localhost sanlock[15027]: 2013-06-03 18:58:33+0000 1539771 [15027]: s11 check_our_lease warning 60 last_success 1539711
....... etc.,etc., .......
Jun  3 18:58:44 localhost sanlock[15027]: 2013-06-03 18:58:44+0000 1539782 [15027]: s10 check_our_lease warning 60 last_success 1539722
Jun  3 18:58:45 localhost wdmd[14963]: test warning now 1539783 ping 1539773 close 0 renewal 1539711 expire 1539791 client 15027 sanlock_99c38ee9-138a-4cf4-b9fe-08441b5dc3f4:3

Jun  3 18:58:45 localhost wdmd[14963]: /dev/watchdog closed unclean
Jun  3 18:58:45 localhost kernel: hpwdt: Unexpected close, not stopping watchdog!

Jun  3 19:00:25 localhost sanlock[15027]: 2013-06-03 19:00:25+0000 1539883 [8906]: s11 delta_renew write error -202
Jun  3 19:00:25 localhost sanlock[15027]: 2013-06-03 19:00:25+0000 1539883 [8906]: s11 renewal error -202 delta_length 141 last_success 1539711
Jun  3 19:00:34 localhost sanlock[15027]: 2013-06-03 19:00:34+0000 1539892 [8903]: ba5013d9 close_task_aio 0 0x7f82300008c0 busy
Jun  3 19:00:35 localhost sanlock[15027]: 2013-06-03 19:00:35+0000 1539893 [8906]: 99c38ee9 close_task_aio 0 0x7f822c0008c0 busy
Jun  3 19:00:35 localhost sanlock[15027]: 2013-06-03 19:00:35+0000 1539893 [8906]: 99c38ee9 close_task_aio 1 0x7f822c000910 busy
  • Followed by several hung multipathd tasks;
Jun  3 19:01:42 localhost kernel: INFO: task multipathd:12223 blocked for more than 120 seconds.
Jun  3 19:01:42 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  3 19:01:42 localhost kernel: multipathd    D 000000000000000c     0 12223      1 0x00000080
Jun  3 19:01:42 localhost kernel: ffff880924cd3968 0000000000000082 0000000000000000 ffff880c07482c90
Jun  3 19:01:42 localhost kernel: ffff880924cd38e8 ffffffffa040bc1e 0000000000000202 0000000000000046
Jun  3 19:01:42 localhost kernel: ffff880bf0b2baf8 ffff880924cd3fd8 000000000000fb88 ffff880bf0b2baf8
Jun  3 19:01:42 localhost kernel: Call Trace:
Jun  3 19:01:42 localhost kernel: [<ffffffffa040bc1e>] ? bfa_ioim_start+0x4e/0x60 [bfa]
Jun  3 19:01:42 localhost kernel: [<ffffffff8150ea05>] schedule_timeout+0x215/0x2e0
Jun  3 19:01:42 localhost kernel: [<ffffffff8108107d>] ? del_timer+0x7d/0xe0
Jun  3 19:01:42 localhost kernel: [<ffffffff8150e683>] wait_for_common+0x123/0x180
Jun  3 19:01:42 localhost kernel: [<ffffffff81063310>] ? default_wake_function+0x0/0x20
Jun  3 19:01:42 localhost kernel: [<ffffffff8150e79d>] wait_for_completion+0x1d/0x20
Jun  3 19:01:42 localhost kernel: [<ffffffff8126296c>] blk_execute_rq+0x8c/0xf0
Jun  3 19:01:42 localhost kernel: [<ffffffff8121b7e9>] ? security_capable+0x29/0x30
Jun  3 19:01:42 localhost kernel: [<ffffffff8107b9ba>] ? capable+0x2a/0x60
Jun  3 19:01:42 localhost kernel: [<ffffffff81266a9c>] ? blk_verify_command+0x2c/0x80
Jun  3 19:01:42 localhost kernel: [<ffffffff81266d05>] sg_io+0x215/0x3d0
Jun  3 19:01:42 localhost kernel: [<ffffffff81267670>] scsi_cmd_ioctl+0x400/0x470
Jun  3 19:01:42 localhost kernel: [<ffffffff810aa3ee>] ? futex_wake+0x10e/0x120
Jun  3 19:01:42 localhost kernel: [<ffffffff810ac350>] ? do_futex+0x100/0xb60
Jun  3 19:01:42 localhost kernel: [<ffffffff81267731>] scsi_cmd_blk_ioctl+0x51/0x70
Jun  3 19:01:42 localhost kernel: [<ffffffffa03a764f>] sd_ioctl+0xaf/0x110 [sd_mod]
Jun  3 19:01:42 localhost kernel: [<ffffffff81264487>] __blkdev_driver_ioctl+0x67/0x80
Jun  3 19:01:42 localhost kernel: [<ffffffff8126490d>] blkdev_ioctl+0x1ed/0x6e0
Jun  3 19:01:42 localhost kernel: [<ffffffff811bb7ec>] block_ioctl+0x3c/0x40
Jun  3 19:01:42 localhost kernel: [<ffffffff81194fb2>] vfs_ioctl+0x22/0xa0
Jun  3 19:01:42 localhost kernel: [<ffffffff81195154>] do_vfs_ioctl+0x84/0x580
Jun  3 19:01:42 localhost kernel: [<ffffffff811956d1>] sys_ioctl+0x81/0xa0
Jun  3 19:01:42 localhost kernel: [<ffffffff810dc645>] ? __audit_syscall_exit+0x265/0x290
Jun  3 19:01:42 localhost kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Environment

  • Red Hat Enterprise Virtualization (RHEV) 3.1.1
  • Red Hat Enterprise Hypervisor (RHEV-H) version 20130501.0.el6_4

Subscriber exclusive content

A Red Hat subscription provides unlimited access to our knowledgebase, tools, and much more.

Current Customers and Partners

Log in for full access

Log In

New to Red Hat?

Learn more about Red Hat subscriptions

Using a Red Hat product through a public cloud?

How to access this content