Why need to assign a host for the storage domain?

Latest response

When creating a storage domain, a host needs to be assigned to this SD. Then I am curious about the relationship between the host and SD. What's the rule to choose the host? Does every host needs to bind a SD?
If SD bind to host1, the vm created on host2 also can use this SD, right?

Responses

RHEV has two hyerarchies, that are interconnected - storage and hosts.

You start at the datacenter level, where the datacenter contains storage domains for storage, and clusters for hosts. Evey cluster contains hosts, and RHEV takes care of connecting the right clusters to the right storage domains, all within the same datacenter.

 

The reason you cannot initially create a storage domain without a host is simple - RHEV-M doesn't touch the storage, it orders hosts to perform storage operations. So without a host, rHEV-M will have nobody to order to create a new storage domain. Once you have a storage domain in a datacenter, every host in the clusters of this datacenter will be connected to this storage domain

 

Thanks, this is very helpful.

 

After the creating of storage domain, can the host which bind to the storage domain be removed or power off? If no, if I have to remove this host, what I should do? Need a new host to replace the old one for the storage domain?

 

I can have multiple storage domains, then is that possible to make multiple storage domain bind to one host?

Once you create an SD, you connect all the hosts in the clusters to it. If you need to remove one of these hosts, simply put it in maintenance mode, and then click "Remove". If there are other hosts in the DC, they will keep working with the SD

That's great, so when creating SD, I just randomly choose a host from the cluster, right? Then why not rhevm help to choose a host but need admin to choose one explicitly? ( From rest api, I have to know which host is in the cluster when creating a SD)

We used to just pick a host at random, but we added the option to do so manually (a host gets picked automatically, but you can change the selection) because SD creation has some overhead (the host formats the LUN, generated metadata etc), and sometimes, you want to manually pick a host that is not busy with heavy IO duties at that moment. 

 

Normally, any operational host within the right DC is ok to be used for SD creation, so I wouldn't worry too much about it

Ok, that's reasonable. Then select the host for SD creating manually is optional, or mandatory?  I mean is it still work that rhev-m helps to select a host randomly?

Yes, the host selection menu starts with a host already selected, you do not have to change it

In fact, I am using the REST API, do I have to specify a host for SD creating?

According to http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Virtualization/3.0/html-single/REST_API_Guide/index.html#chap-REST_API_Guide-Storage_Domains it looks like you do need to specify a host. If you don't want to, just pick the first host in the relevant DC

Dan, Thank you very much. It's very clear now.

I created one SD (named image) against the host1, it worked well. Then for some reason, I shutdown the host1 and created a host2. But the SD 'image'could not get up automatically.

 

My question is that whether the SD can be keep on against the host2 when host1 down? (I did not set host1 to maintenance and did not removing the host1) And what I can do to bring up (or recover) the SD?

 

Looks like when SD is down, the vms against it cannot be removed even in by force. Then what action should be done for a vm when the SD cannot be recovered?

Then for some reason, I shutdown the host1 and created a host2. But the SD 'image'could not get up automatically.

 

Can host2 access the storage for sd without any problems?

 

> My question is that whether the SD can be keep on against the host2 when host1 down?

 

When host1 goes down which is the SPM while host2 is up, RHEV-M will fence host1 and failover SPM role to host2 to keep Data Center and Storage up. Do you have fencing configured for both hosts?

 

When SD is down, you can bring it up if there is a host in the cluster which has access to the sd and remove vms one by one.

 

If you have wiped out the lun or NFS share used for the sd without first removing the SD from RHEV-M, you can "Destroy" the sd so that all vms in it will be removed from database. (This action cannot be recovered)

 

Sadique, thanks for your reply.

 

Yes, host2 can access the sd. The path of sd is nfs, I can manually mount the sd from the host2.

 

I did not setup the fence (the power management?) for both host1 and host2. And currently, I see the host1 still in SPM stat. Then if no fence, is it possible to switch the host automatically?

 

In my case, how could I bring up the SD? I don't want to destory this SD since there are data inside.

If you have shutdown host1 and there is no fencing, you can right click on host1 -> Click "Confirm Host has been rebooted". This would fail over the SPM status to host2 and storage domain will get activated.

Yes, you are right. I when click 'confirm host has been rebooted' for host1, the host2 takes over the SPM for SD, and SD is up.

Then several questions:
1. how to do the 'confirm host has been rebooted' through REST API?
2. What's the SPM short for?
3. How to know the status of SD. When my SD is in down stat, I cannot get this information from REST API.
4. How to know the SPM host for a SD from REST api? If the SPM host has problem, I need do the 'confirm host has been rebootd' against the SPM host to release the controling of SD.

The right solution for you is to have proper fencing set up and configured for all hypervisors in the data center.

 

To explain with more details:

 

SPM stands for "Storage Pool Manager". Changes to the storage (like creating a new lv when adding a disk to a vm, removing lv when deleting a disk, extending lvs and etc) are done only by one host in a Data Center. This is to prevent corruption in the lvm metadata. RHEV-M selects a host randomly and designates it as SPM to do this task.

 

When the host which has the SPM role goes down, RHEV-M fences this host using power management configured and moves the role to another hypervisor. RHEV-M must fence the host to move the role to make sure that there won't be two hosts in the DC with the SPM role. If you don't have fencing configured, RHEV-M will not move this role to any other host to protect the storage lvm metadata from getting corrupted and causing outages that are not recoverable.

 

"Confim host has been rebooted" is a work around to save people who have no fencing configured. They will "Reboot" the Down host which had the spm role (this makes sure that the host doesn't have the processes to become the SPM no more running), then tell the rhev-m that the host has been rebooted and it's safe to move the spm role to another host by clicking "Confirm Host has been rebooted".

 

So this is a manual process that requires human intervention and should be done manually to recover the environment.

 

SPM host is not for an SD, it's for a Data Center. There would be only one SPM host for all sds attached to a single data center. To know the SPM host for a data center, go to Hosts tab and look at the SPM role (last column) against each host.

Sadique, thanks very much for your explanation.

Looks like the fence action has to be initiated from another host, that means all the host should have the ability to access the RAC of other hosts, since rhev-m randomly choose one host to check the power status for another host, right?

Since setup the fence is a little complex, back to my previous issue that host1 is SPM but failed, will the removing of a SPM host1 trigger the role of SPM switch to host2? I have to figure out an approach to save the SD back when host1 failed for the scenario that fence does not work from REST api. (since 'Confirm Host has been rebooted ' doesn't support from REST API)

Looks like the fence action has to be initiated from another host, that means all the host should have the ability to access the RAC of other hosts, since rhev-m randomly choose one host to check the power status for another host, right?

 

Correct.

 

If host which has SPM role has gone down, you cannot remove without moving the SPM role to a different host via "Confirm host has been rebooted".

The reason why "Confirm host has been rebooted" not in API is because, it involves a lot of manual process like inspect physically what happened to the SPM host, reboot it  or shut it down and then click "Confirm host has been rebooted" to failover the service.

 

Manual inspection and action is necessary to avoid having two hosts with SPM role which will corrupt the storage. Eg, If rhev-m shows spm host is down (Non-Responsive), it may be due to a missing network link between rhevm and spm host and the spm processes still running on the spm host. If you just do "Confirm host has been rebooted" without shutting down the spm host or rebooting it, the service would failover to another host where we will have two hosts with spm processes running and simulatneously making changes to storage which may corrupt it.

 

As always said, you can file an RFE to have this feature in RFE by opening a case with RH if you prefer to do the task via API after shutting down or rebooting the spm host manually.

OK, I see. I'll consider it. Thanks.