Multipath issues with iSCSI storage

Latest response

Setup:
- 1 RHEV-M (i7, 8 GB RAM, 2 x GBit NIC)
- 2 Hypervisors (Conroe, 8 core, 24 GB RAM, 6 x Gbit NIC)
- Infortrend iSCSI storage 4 TB (4 disks),  4 x Gbit iSCSI channel
- Cisco Gbit switch (does no trunking according 802.3ad (at least our network guys did not succeed in configuring it....))
- Since switch doesn't support trunking we drive four subnets, one for each iSCSI channel.
- Each channel has a 1 TB LUN configured.

Problem:
When we configure a storage domain and add all 4 LUNS to the same (master) storage domain, the second hypervisor (not SPM) looses connection to storage domain after 5 minutes. (Could not connect to storage domain attched to it)
I see strange multipath errors on the hypervisor console.
This behavior is reproducible (did a fresh setup for reproduction).
It is not related to network config issues, afaik, we defined 4 logical storage networks in the cluster.
I can manually connect to the LUNs from each hypervisor. (iscsiadm)

- Solution:
Configuring the 4 LUNs in 4 seperate domains works fine and the second hypervisor doesn't loose connection.

- Question:
It would be nice to see our 4 TB  storage in one domain instead of 4 domains holding 1 TB each.
Did I miss something with configuring storage domains properly or is this issue known ?
 

Responses

Sounds like you didn't configure the SAN properly. What you need to do is create a set of LUNs (one or more - up to you how many) and assign them to be accessible through all four iSCSI portals (the ethernet ports the storage box has).

 

Once you have that, simply discover and login to every IP address, when creating the storage domain, this should amount up to 4 paths to the same LUN or LUNs. 

 

 

Hope it makes sense, if not I can try to elaborate

Thx for the fast reply.

 

If I got U right all 4 LUNs have to be accessible by all four channels.

This would mean, logging to the first portal shows the 4 LUNs and logging to the second, (third fourth) portal would show the same LUNs again, right ?

 

If so, when I select the LUNs for master storage domain,  I see the 4 LUNs four times, which checkboxes do I check ? all 16 or just each LUN once ?

 

I try that as soon as I have access to the storage.

 

Kind Regards Daniel

 

If I got U right all 4 LUNs have to be accessible by all four channels.

 
Correct
 

This would mean, logging to the first portal shows the 4 LUNs and logging to the second, (third fourth) portal would show the same LUNs again, right ?

 
Exactly
 

If so, when I select the LUNs for master storage domain,  I see the 4 LUNs four times, which checkboxes do I check ? all 16 or just each LUN once ?

 
When you open the "NewDomain" dialogue, and choose the Data/iSCSI storage domain type, you should see "Discover Targets", which is where you enter the portal IPs. This should produce all the possible IPs, as returned by the sendtargets command, with a "Login" button next to each. Once you login to every portal, you'll see the available LUNs. simply check the LUNs under every path (IP) that you want to use in this storage domain. 

Hi Dan,

 

Thx a lot for the fast and competent answer.

Indeed for doing multipath at least we need multiple paths ;)

 

Did as you said, adding all 4x4 LUNs when creating master domain.

Seems to be running now.

Here some screenshots:

One question to the last screenshot, I actually only see three items, I intentionally thought there must be four. (Altough master domain sais 4TB of space)

 

 

Before Login:

After Login:

Editing Domain:

Somehow the screenshots did not make it to the discussion.

Sorry for that.

How can I add images to this site ?

I have the PNGs but did not managed to add them here.

it can be problematic, we are working to make it better. If they don't contain any important data, I would suggest using a free image hosting service and providing a link here.

If not, we can try and think of another solution

Here the screenshots (hopefully):

 

While creating I saw all 4 LUNs on all 4 channels.

I checked all 16 when creating domain, for this see the first screenshot:

 

http://www.imagebanana.com/view/doftmo10/Creating.png

 

Went well, iSCSI-share is up and running. with 3.7 TB.

 

However, when editing the iSCSI-share, I see only 3 LUNs per channel (see second screenshot:), but the storage domain holds 3.7 TB which is sort of confusing.

 

http://www.imagebanana.com/view/opj7sv21/Running.png

 

So I do not know if we have really a properly configured storage domain.

 

Cheers Daniel

Interesting! I have asked a colleague who has a large setup handy to try and reproduce this issue, give me a few days to see what is going on

Thank you for investigating..

If you need any further information (logs, traces, pcaps) I will provide them to you.

Some more information about this issue (storage is really not working properly)

 

Since we had to physically move hypervisor0, I set it to maintenace mode, shut it down, moved it, booted it.

 

Console: multipath error. two of them (it rings a bell, are these the 2 LUNs missing in above screenshot ?)

Manager doesn't reactivate hypervisor0, it moved to non-responsive state.

 

Same behavior as in the first post.

I better don't move the second hypervisor in maintenance mode, or our cluster is gone.

 

Bad day.

 

this really sounds like there is a SAN issue there, but that would require some extra investigation. Are you able to open a support case for this? I'm asking because such issues would usually require lots of logs to be sent, which is beyond the scope of the UG issues here, and you probably wouldn't want to expose private information in a public forum.

 

My colleague is a bit busy, but he promised to try and reproduce your issue this week, if he has the time. If he fails to do so, the support guys will definitely be able to do the same in their own labs

I am not so convinced that it is a SAN issue.

Did a spice-connected cloud desktop trial based on Fedora. One LUN per desktop VM. Worked like a charm.

Maybe I try an ovirt based install to investigate better.

I will keep you informed, we will open a support case, if problem persists.

Thanks for your fast answers anyway.

I have ran into a similar problem at a customer install of RHEVM with an IBM DS3512 iSCSI/SAS Subsystem.  I could never figure out how to configure both target portal addresses in the storage domain.  The way the DS3500 works is there are two controllers (A/B) and they are active/active with 4 1Gb iSCSI Host ports per controller.  Each port/pair on each controller are on different subnets (A1/B1, A2/B2, A3/B3, A4/B4).  Because the customer only have 2 GigE ports available to use, I was using A1/B1 pair.

 

When I created the storage domain, I had the LUNs all owned by controller A and discovered that address.  RHEVM never let me put in the second address.  I guess I know now from above I should have been able to go in and edit the Storage Domain to discover the second address.  But I swear I tried that and it didn't work.

 

I will have to get back down to that customer and try that and see if I can get it to work.

 

James

When you create the SD, the flow is 

1. enter target IP

2. discover

3. login

4. pick LUN

 

If at this point, instead of clicking "ok" you repeat these steps with the next IP, you will add a path. Wash, rince repeat...

Dan, thanks for the comment.  Am I able to go into the storage domain and edit and add a new Target Portal address?  I think I tried that and it would not let me select or login to any other LUNs.

 

I'll try it again when I get a chance and take some screen shots.

 

Regards,

James

I think at the moment it's only doable when adding a new domain, please file a feature request to be able to edit iscsi connections for a domain, through a support case