Looking for a good HA Clustering tutorial

Latest response

I am trying to setup R.H. HA on a pair of web servers. My goal is to have a single LUN presented to the two servers, one primary and the other the failover. 

 

I've been reading over the RHEL 6.0 Cluster Administration doc.  The issue is, I've never setup a cluster before, and I'm am having some difficulties.

 

The doc is written good, but it seems to take into assumption that a person has done clustering before, which I have not.  I think that I need a good tutorial, hopefully a video, that will walk me through setting up a high availability clustered environment.  Can someone point me to a good resource?

 

Thanks.

 

Daryl

Responses

I like this link from the latest Red Hat Summit:
http://www.redhat.com/summit/2011/presentations/summit/in_the_weeds/wednesday/cameron_hohberger_w_420_clustering_rhel6.pdf

Thomas Cameron has several good clustering related links on his personal website:
http://people.redhat.com/tcameron/

In general, for production web services, one doesn't bother to set up a fail-over HA cluster. It's more normal to set up a scalable/parallel cluster, then use a front-end device/service to direct requests across available web services. This gives you both load-scaling and protection against failures of a given instance.

 

Even if it's just a learning exercise, you might want to explore also setting up a scalable cluster, as you'll likely see both types "in the wild".

Gentleman,

 

Thank you for the updates, I really appreciate that.

 

I like the tutorial from Thomas Cameron, but unfortunately he doesn't really explain what he is doing.  It’s easy to follow along, but I'm still confused by a few things, and I am getting the same results that I was getting when I was just reading through the admin docs. 

 

The area that I'm failing on is the resources. For whatever reason, the resources never start, and I'm not able to find any errors.  I know that I must be doing something wrong leading up to the resources, but I'm not exactly sure what because everything else seems to work just fine.

 

Is there more resources (docs) that I can review that will provide more information on what I'm actually doing?

 

Thanks.

 

Daryl

Hi Daryl,

Figuring out why a resource is failing can definitely be a difficult exercise.  I have a few recommendations that might make it easier on you:

 

1) Increase the verbosity of rgmanager.  Have a look at #3 here:

 

    What are the debug options that can be enabled for RHEL5 clustering?

    https://access.redhat.com/kb/docs/DOC-53506

 

This will give more verbosity from rgmanager and the resource agents.

 

2) If that doesn't give you any indication as to why it failed, then you can try using rg_test.  This tool provides a way to start, stop, or status a resource or service outside the control of rgmanager.  Because you are doing this independently from the safety/status checks of rgmanager, you always want to make sure you've stopped the service before attempting it.  For example:

 

  # clusvcadm -d <service>

 

Now, you can try starting the service with rg_test to see what happens.  This is beneficial because it gives the full debug output, as well as the output of any commands that are being run within the agent.

 

  # rg_test test /etc/cluster/cluster.conf start service <service>

 

Or to start individual resources, you give the resource type instead of "service". For example:

 

  # rg_test test /etc/cluster/cluster.conf start ip 192.168.2.5

  # rg_test test /etc/cluster/cluster.conf start fs data_fs

 

You'll always want to make sure to stop the resources or services again when you are done, again so rgmanager's actions don't conflict with resources that were started independently.  You can do this again with rg_test, or just by disabling the service:

 

  # rg_test test /etc/cluster/cluster.conf stop ip 192.168.2.5

  # rg_test test /etc/cluster/cluster.conf stop fs data_fs

  # clusvcadm -d <service>

 

3) Alternatively, you can post your /etc/cluster/cluster.conf here, as well as the snippet of /var/log/messages showing where you attempted to start the service (ideally after implementing the debug logging in recommendation #1). 

 

4) The resource agents are all written in bash, so if you're handy with scripting then you can take a look at the agents themselves in /usr/share/cluster/*.  Sifting through them to find out what is wrong can be time consuming, but as always the source code is the best documentation. 

 

Hope that helps!

 

Regards,

John Ruemker, RHCA

Red Hat Software Maintenence Engineer

Online User Groups Moderator

 

 

P.S. I've made a note that we may need better documentation on the resource agents, as well as troubleshooting a problem with cluster services not starting.  I'll see if we can get something published along these lines. 

Just realized you mentioned RHEL 6, not RHEL 5.  The correct article is:

 

  What are the debug options that can be enabled for Red Hat Enterprise Linux Server 6 (with the High Availability Add on)?

  https://access.redhat.com/kb/docs/DOC-53585

 

Regards,

John Ruemker, RHCA

Red Hat Software Maintenence Engineer

Online User Groups Moderator

Also, you might want to check out the Cluster Admin guide if you haven't already seen it:

 

  http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Cluster_Administration/index.html

 

Regards,

John Ruemker, RHCA

Red Hat Software Maintenence Engineer

Online User Groups Moderator