All hosts are stuck in "non responsive" status. How can I fix it?
We are using RHEV with 5 hosts (4 hosts for RHEV, 1 host for RHEV-Manager). Our RHEV architecture likes below:
- Cluster1
--- host1
--- host2 (SPM : connect to Data center 1)
- Cluster2
--- host3
--- host4 (SPM : connect to Data center 2)
- host5 : RHEV-Manager
Every hosts except for hosts5 are runnig on RHEL 5.4 (OS) and using NetApp storage (host5 are running on Windows2003 Server),
Hypervisor 5.5 are installed and HP iLOs are used for Fencing method.
Now, all hosts (host1 ~ host4) shown in RHEV-M become "Non responsive" status and I am unable to make them "activate" status.
When I checked Data centers tab, cluster1 and cluster2 are also 'Not Operational' status.
(Also, every storages are "Inactive" status in Storage tab.)
I have tried :
- changing host1 and host2 (cluster1) to maintenance mode, reboot them (phisically reboot).
- reinstall host1 (host2 can not be reinstalled because it has 15 running VMs : Actually, I don't know why they are on running status).
- multipath check (It may be OK in every hosts)
$> multipath -ll
360a98000572d434f5234574772344d37 dm-1 NETAPP,LUN
[size=500G][features=0][hwhandler=0][rw]
_ round-robin 0 [prio=8][active]
_ 2:0:0:1 sdb 8:16 [active][ready]
_ 3:0:0:1 sdf 8:80 [active][ready]
_ round-robin 0 [prio=2][enabled]
_ 2:0:1:1 sdd 8:48 [active][ready]
_ 3:0:1:1 sdh 8:112 [active][ready]
360a98000572d43344f6f574771724b4b dm-0 NETAPP,LUN
[size=500G][features=0][hwhandler=0][rw]
_ round-robin 0 [prio=8][active]
_ 2:0:1:0 sdc 8:32 [active][ready]
_ 3:0:1:0 sdg 8:96 [active][ready]
_ round-robin 0 [prio=2][enabled]
_ 2:0:0:0 sda 8:0 [active][ready]
_ 3:0:0:0 sde 8:64 [active][ready]
What's the check point to fix it? Or what logs are necessary to check the exact cause?
Thank you for advance.
Responses
This usually requires manually modifying the RHEV Manager SQL database to set all the VMs and hosts down, then trying to bring everything up from scratch. We don't document the procedure because there is a lot of scope for error and corrupting the database entirely. The relevant logs are collected in the Log Collector program.
To move forward, I suggest you provision a new RHEV setup using the latest RHEV 3. On your existing setup, use the RHEV Manager to identify which guest disks correspond to which LVM LVs on the hypervisors. Do a block-level copy of the disks using dd and store them on temporary storage.
On the new RHEV 3 setup, create new guests with new blank disks, then dd the disk contents back from temporary storage to the empty guest.
You may wish to keep up on product lifecycle announcements. RHEV 2 has been End Of Life and out of support for over two years. We proactively communicate lifecycle changes to all accounts with the relevant entitlements.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
