Newly upgraded RHEV-H 3.5 hosts cannot access storage domain

Latest response

They say what doesn't kill you makes you stronger. What is it with me and upgrades?

This time, it's my own environment. I just upgraded my RHEV environment from 3.4 to 3.5. The RHEVM upgrade went smoothly. I have two hosts, rheva and rhevb. Both were running rhevh-6.5-20141017.0 before the upgrade. After the 3.5 upgrade, I upgraded the hosts to rhevh-6.6-20150128.0. For each host, I migrate VMs away, put it into maintenance mode, and upgrade it. The host rebooted and...nothing. It doesn't come back online and eventually generates this error:

Host rheva.infrasupport.local cannot access the Storage Domain(s) attached to the Data Center InfrasupportDataCenter. Setting Host state to Non-Operational.

I have two networks in my rhev environment. The default rhevm network and another one named storage. The problem is, the eth2 NIC in the host that's supposed to connect to the storage network doesn't come back online. If I ifup eth2 from an ssh session to the host, then it finds its storage and RHEV-M eventually tells me it's up. But I have to ifup eth2 by hand. I can find no automation to bring NIC eth2 up. On the host, I noticed ifcfg-eth2 says "ONBOOT=no". I edited that to "ONBOOT=yes" and persisted it. Didn't matter - after a reboot it was back to "ONBOOT=no". Go figure. Both hosts - rheva and rhevb behaved this way.

In the RHEV-M GUI when rhevb was in maintenance mode, I selected the host and did "Setup Host Networks". I assigned NIC eth2 and IP Address of 10.10.11.22, mask 255.255.255.0 and saved changes. And sure enough, that's what I see in ifcfg-eth2 on the host. I also see that IP Address in the RHEV-M GUI. But rebooting host rhevb, I have to ssh into it and manually do ifup eth2 for it to find its storage.

After fighting with host rhevb, I did host rheva. Here, I just did ifup eth2. The RHEV-M GUI shows nothing for an IP Address for eth2 on this host.

What's up with that? Why won't eth2 come up on its own on either RHEV-H host?

A little background - I had a disaster here in late December. I'm using aging Proliant ML730 as an NFS server for my storage domain. It took a massive failure and I lost everything. It was a genuine disaster. I took care of the hardware problem, built a brand new RHEV 3.4 environment, and recovered everything from backups. The hosts were sitting there, so I connected them to my new RHEV environment in SSH sessions. I did not build the hosts from scratch from CD - I just connected them into my new environment. As long as I was going to the trouble, I set up a storage network in its own subnet in case I run into some money and decide to physically isolate it one of these days. So something may have gotten messed up in here, but I sure don't know what.

As a Red Hat partner, I'm not able to log a support case for my own stuff so any advice here would be appreciated.

thanks

  • Greg Scott

Responses