Cannot automatically restart VM on another host in IBM BladeCenter

Latest response

Hello!

I'm testing RHEV-M 3.1 on IBM BladeCenter-E. There are two HS23 (Type 7875) and one HS22 (Type 7870) blades.

RedHat hypervisor installed on blades, and RHEV-M installed on separate machine running RedHat Enterprise Linux 6.4. Data Center is  configured to use external  iSCSI storage.

Everything works fine - power management tests (bladecenter type fencing) successful for all blades, VM live migration goes with no problem from any blade to any (I use VM running Windows 2008 R2 server).

Now I am trying to test HA by pulling out of chassis blade server with VM running on it, expecting VM will restart automatically on another blade. But it doesn't happen.

VM goes into "status unknown" state, and blade host goes into "non-responsive" state.  Nothing happens next.

VM is made high available with high priority to migrate. Migration allowed for all VM's from any host to any.

What I do wrong?

How to fix this?

Responses

Have you followed the event using the ovirt log on the RHEV-M host?

[root@rhvmgr01 ~]# tail -f /var/log/ovirt-engine/engine.log

Also - the Events pane has an "advanced view" option that might shed some light on the issue.

Hi James!

Seems like RHEV-M was not able to fence missing blade.

Option missing_as_off=1 did the trick.

Thank you very much for your help!

Great to hear you resolved the issue and THANK YOU for updating the thread to help others.

Hi James!

I got IBM Flex System Enterprise chassis with two x240 blades and one CMM. There is also FSM on a third blade.

Everything is configured just the same way as for Bladecenter-E in my initial post, and everything works fine.

But now RHEV-M is not able to fence missing blade. In the ovirt engine log I see the following message:

"The fence-agent script reported the following error: Connection timed out".

In the event log of CMM I see successful SSH logins an logofs of fencing script. I can do SSH login to CMM's command line and execute any command with no problem. CMM command line is almost identical to AMM's cli. 

What could be a reason for this? Is there any patch available?

Thank you in advance!

Below complete event log:

2013-04-18 18:28:47,516 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-47) XML RPC error in command GetStatsVDS ( HostName = node3.test.local ), the error was: java.util.concurrent.TimeoutException, TimeoutException:
2013-04-18 18:28:47,516 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-47) vds::refreshVdsStats Failed getVdsStats,  vds = 4eea063a-a6ba-11e2-8a9c-000c291a641f : node3.test.local, error = VDSNetworkException: VDSNetworkException:
2013-04-18 18:28:47,527 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-47) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = 4eea063a-a6ba-11e2-8a9c-000c291a641f : node3.test.local, VDS Network Error, continuing.
VDSNetworkException:
2013-04-18 18:28:52,549 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-80) XML RPC error in command GetCapabilitiesVDS ( HostName = node3.test.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoRouteToHostException: No route to host
2013-04-18 18:28:52,549 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-80) ResourceManager::refreshVdsRunTimeInfo::Failed to refresh VDS , vds = 4eea063a-a6ba-11e2-8a9c-000c291a641f : node3.test.local, VDS Network Error, continuing.
VDSNetworkException:
2013-04-18 18:28:55,550 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-63) XML RPC error in command GetCapabilitiesVDS ( HostName = node3.test.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoRouteToHostException: No route to host
2013-04-18 18:28:55,550 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-63) VDS::handleNetworkException Server failed to respond,  vds_id = 4eea063a-a6ba-11e2-8a9c-000c291a641f, vds_name = node3.test.local, error = VDSNetworkException:
2013-04-18 18:28:55,579 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (pool-3-thread-46) ResourceManager::vdsNotResponding entered for Host 4eea063a-a6ba-11e2-8a9c-000c291a641f, 192.168.12.13
2013-04-18 18:28:55,656 INFO  [org.ovirt.engine.core.bll.FencingExecutor] (pool-3-thread-46) Executing <Status> Power Management command, Proxy Host:node1.test.local, Agent:bladecenter, Target Host:node3.test.local, Management IP:192.168.12.9, User:USERID, Options:port,slot=3,secure=true,missing_as_off=1
2013-04-18 18:28:55,669 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (pool-3-thread-46) START, FenceVdsVDSCommand(HostName = node1.test.local, HostId = ef102384-a77c-11e2-b09b-000c291a641f, targetVdsId = 4eea063a-a6ba-11e2-8a9c-000c291a641f, action = Status, ip = 192.168.12.9, port = , type = bladecenter, user = USERID, password = ******, options = 'port,slot=3,secure=true,missing_as_off=1'), log id: 4080f4b6
2013-04-18 18:29:00,621 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-90) XML RPC error in command GetCapabilitiesVDS ( HostName = node3.test.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoRouteToHostException: No route to host
2013-04-18 18:29:03,624 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-95) XML RPC error in command GetCapabilitiesVDS ( HostName = node3.test.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoRouteToHostException: No route to host
2013-04-18 18:29:08,662 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-13) XML RPC error in command GetCapabilitiesVDS ( HostName = node3.test.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoRouteToHostException: No route to host
2013-04-18 18:29:11,663 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-24) XML RPC error in command GetCapabilitiesVDS ( HostName = node3.test.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoRouteToHostException: No route to host
2013-04-18 18:29:13,056 INFO  [org.ovirt.engine.core.bll.VdsLoadBalancer] (QuartzScheduler_Worker-89) VdsLoadBalancer: Starting load balance for cluster: RHCluster, algorithm: EvenlyDistribute.
2013-04-18 18:29:13,056 INFO  [org.ovirt.engine.core.bll.VdsLoadBalancer] (QuartzScheduler_Worker-89) VdsLoadBalancer: high util: 75, low util: 0, duration: 2, threashold: 80
2013-04-18 18:29:13,078 INFO  [org.ovirt.engine.core.bll.VdsLoadBalancingAlgorithm] (QuartzScheduler_Worker-89) VdsLoadBalancer: number of relevant vdss (no migration, no pending): 2.
2013-04-18 18:29:13,078 INFO  [org.ovirt.engine.core.bll.VdsCpuVdsLoadBalancingAlgorithm] (QuartzScheduler_Worker-89) VdsLoadBalancer: number of over utilized vdss found: 0.
2013-04-18 18:29:13,078 INFO  [org.ovirt.engine.core.bll.VdsCpuVdsLoadBalancingAlgorithm] (QuartzScheduler_Worker-89) VdsLoadBalancer: max cpu limit: 60, number of ready to migration vdss: 2
2013-04-18 18:29:15,075 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (pool-3-thread-46) FINISH, FenceVdsVDSCommand, return: Test Failed, Host Status is: unknown. The fence-agent script reported the following error: Connection timed out
, log id: 4080f4b6
2013-04-18 18:29:15,076 INFO  [org.ovirt.engine.core.vdsbroker.SetVmStatusVDSCommand] (pool-3-thread-46) START, SetVmStatusVDSCommand( vmId = 406fadfc-3289-4e68-9b30-f502f74a4571, status = Unknown), log id: 2412ae06
2013-04-18 18:29:15,084 INFO  [org.ovirt.engine.core.vdsbroker.SetVmStatusVDSCommand] (pool-3-thread-46) FINISH, SetVmStatusVDSCommand, log id: 2412ae06
2013-04-18 18:29:15,109 ERROR [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-3-thread-46) Failed to run Fence script on vds:node3.test.local, VMs moved to UnKnown instead.
2013-04-18 18:29:15,109 WARN  [org.ovirt.engine.core.bll.VdsNotRespondingTreatmentCommand] (pool-3-thread-46) CanDoAction of action VdsNotRespondingTreatment failed. Reasons:VDS_FAILED_FENCE_VIA_PROXY_CONNECTION
2013-04-18 18:29:16,742 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (QuartzScheduler_Worker-52) [a6159e5] XML RPC error in command GetCapabilitiesVDS ( HostName = node3.test.local ), the error was: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException, NoRouteToHostException: No route to host
 

This one seems like you may need to open a ticket.  When you add the new blade, you entered the fencing information (my shop uses Dell (iDRAC)).  Did you test an ssh connection from your RHEV-Virt Manager host out to the BLADE CMM?  (or did you test from hypervisor itself)?  Actually a simple PING from the Virt Manager host to the Blade CMM might indicate what is going on.  Also - did you validate the Blade CMM has the correct network information?

I assume your Blade infrastructure has a separate management subnet.  Perhaps the subnetting on the CMM itself is not correct?  We use DHCP for ours (with DNS updates) to mitigate these types of issues.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.