Why need to assign a host for the storage domain?
When creating a storage domain, a host needs to be assigned to this SD. Then I am curious about the relationship between the host and SD. What's the rule to choose the host? Does every host needs to bind a SD?
If SD bind to host1, the vm created on host2 also can use this SD, right?
Responses
RHEV has two hyerarchies, that are interconnected - storage and hosts.
You start at the datacenter level, where the datacenter contains storage domains for storage, and clusters for hosts. Evey cluster contains hosts, and RHEV takes care of connecting the right clusters to the right storage domains, all within the same datacenter.
The reason you cannot initially create a storage domain without a host is simple - RHEV-M doesn't touch the storage, it orders hosts to perform storage operations. So without a host, rHEV-M will have nobody to order to create a new storage domain. Once you have a storage domain in a datacenter, every host in the clusters of this datacenter will be connected to this storage domain
Once you create an SD, you connect all the hosts in the clusters to it. If you need to remove one of these hosts, simply put it in maintenance mode, and then click "Remove". If there are other hosts in the DC, they will keep working with the SD
We used to just pick a host at random, but we added the option to do so manually (a host gets picked automatically, but you can change the selection) because SD creation has some overhead (the host formats the LUN, generated metadata etc), and sometimes, you want to manually pick a host that is not busy with heavy IO duties at that moment.
Normally, any operational host within the right DC is ok to be used for SD creation, so I wouldn't worry too much about it
According to http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Virtualization/3.0/html-single/REST_API_Guide/index.html#chap-REST_API_Guide-Storage_Domains it looks like you do need to specify a host. If you don't want to, just pick the first host in the relevant DC
Then for some reason, I shutdown the host1 and created a host2. But the SD 'image'could not get up automatically.
Can host2 access the storage for sd without any problems?
> My question is that whether the SD can be keep on against the host2 when host1 down?
When host1 goes down which is the SPM while host2 is up, RHEV-M will fence host1 and failover SPM role to host2 to keep Data Center and Storage up. Do you have fencing configured for both hosts?
When SD is down, you can bring it up if there is a host in the cluster which has access to the sd and remove vms one by one.
If you have wiped out the lun or NFS share used for the sd without first removing the SD from RHEV-M, you can "Destroy" the sd so that all vms in it will be removed from database. (This action cannot be recovered)
The right solution for you is to have proper fencing set up and configured for all hypervisors in the data center.
To explain with more details:
SPM stands for "Storage Pool Manager". Changes to the storage (like creating a new lv when adding a disk to a vm, removing lv when deleting a disk, extending lvs and etc) are done only by one host in a Data Center. This is to prevent corruption in the lvm metadata. RHEV-M selects a host randomly and designates it as SPM to do this task.
When the host which has the SPM role goes down, RHEV-M fences this host using power management configured and moves the role to another hypervisor. RHEV-M must fence the host to move the role to make sure that there won't be two hosts in the DC with the SPM role. If you don't have fencing configured, RHEV-M will not move this role to any other host to protect the storage lvm metadata from getting corrupted and causing outages that are not recoverable.
"Confim host has been rebooted" is a work around to save people who have no fencing configured. They will "Reboot" the Down host which had the spm role (this makes sure that the host doesn't have the processes to become the SPM no more running), then tell the rhev-m that the host has been rebooted and it's safe to move the spm role to another host by clicking "Confirm Host has been rebooted".
So this is a manual process that requires human intervention and should be done manually to recover the environment.
SPM host is not for an SD, it's for a Data Center. There would be only one SPM host for all sds attached to a single data center. To know the SPM host for a data center, go to Hosts tab and look at the SPM role (last column) against each host.
Looks like the fence action has to be initiated from another host, that means all the host should have the ability to access the RAC of other hosts, since rhev-m randomly choose one host to check the power status for another host, right?
Correct.
If host which has SPM role has gone down, you cannot remove without moving the SPM role to a different host via "Confirm host has been rebooted".
The reason why "Confirm host has been rebooted" not in API is because, it involves a lot of manual process like inspect physically what happened to the SPM host, reboot it or shut it down and then click "Confirm host has been rebooted" to failover the service.
Manual inspection and action is necessary to avoid having two hosts with SPM role which will corrupt the storage. Eg, If rhev-m shows spm host is down (Non-Responsive), it may be due to a missing network link between rhevm and spm host and the spm processes still running on the spm host. If you just do "Confirm host has been rebooted" without shutting down the spm host or rebooting it, the service would failover to another host where we will have two hosts with spm processes running and simulatneously making changes to storage which may corrupt it.
As always said, you can file an RFE to have this feature in RFE by opening a case with RH if you prefer to do the task via API after shutting down or rebooting the spm host manually.