RGManager will not start samba server
Hi,
Basically i am trying to setup a HA share system using DRBD in the background for replication and then using the cluster to move the ip, mount (for the DRBD device) and the samba share on that mount. I have the ip and mount moving over fine without any issues but when it moves over to the other node it says started and considers the service fully running but there is no smb process running and no mention of rgmanager even attempting to start it or having any problems with it?!
I am not sure if i am asking in the right place but i have had a few days of google searches and cannot find a solution anywhere. I am also quite new to lunix and i am looking into replacing some of our windows servers for Redhat.
I am running 2 nodes (RH Trials) in an esxi machine as a test environment.
I am going about getting my solution all wrong or am i missing something?
Here is my cluster.conf:
<?xml version="1.0"?>
<cluster config_version="33" name="RHELClus2">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="RHClusPri" nodeid="1" votes="1">
<fence/>
</clusternode>
<clusternode name="RHClusSec" nodeid="2" votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices/>
<rm log_level="7">
<failoverdomains/>
<resources>
<ip address="192.168.16.26" sleeptime="10"/>
<fs device="/dev/drbd1" fsid="17062" mountpoint="/mnt/drbd1" name="DRBD"/>
<script file="/usr/local/etc/drbd.d/makepridrbd.sh" name="MkDRBDDriPri"/>
<smb name="Samba" workgroup="DOMAIN.LOCAL"/>
</resources>
<service autostart="1" exclusive="0" name="Cluster" recovery="relocate">
<ip ref="192.168.16.26">
<script ref="MkDRBDDriPri">
<fs ref="DRBD">
<smb ref="Samba"/>
</fs>
</script>
</ip>
</service>
</rm>
</cluster>
I have tried it with a smb.conf set in the resource and still have the same fault. I am using luci for the web management.
Here is my /var/log/messages entries for when the service comes online:
Apr 28 19:25:27 RHClusPri modcluster: Starting service: Cluster on node
Apr 28 19:25:27 RHClusPri rgmanager[1651]: Starting disabled service service:Cluster
Apr 28 19:25:27 RHClusPri rgmanager[18169]: Adding IPv4 address 192.168.16.26/27 to eth0
Apr 28 19:25:29 RHClusPri avahi-daemon[1487]: Registering new address record for 192.168.16.26 on eth0.IPv4.
Apr 28 19:25:31 RHClusPri rgmanager[18265]: Executing /usr/local/etc/drbd.d/makepridrbd.sh start
Apr 28 19:25:31 RHClusPri kernel: block drbd1: role( Secondary -> Primary )
Apr 28 19:25:31 RHClusPri rgmanager[18389]: mounting /dev/drbd1 on /mnt/drbd1
Apr 28 19:25:31 RHClusPri rgmanager[18412]: mount /dev/drbd1 /mnt/drbd1
Apr 28 19:25:31 RHClusPri kernel: kjournald starting. Commit interval 5 seconds
Apr 28 19:25:31 RHClusPri kernel: EXT3 FS on drbd1, internal journal
Apr 28 19:25:31 RHClusPri kernel: EXT3-fs: mounted filesystem with ordered data mode.
Apr 28 19:25:31 RHClusPri rgmanager[1651]: Service service:Cluster started
Any help would be appreciated or if you could point me into the right direction for additional help.
Thanks
Responses
Hi Simon,
That is definitely strange. Looking at your configuration, I don't see anything immediately obvious that would cause this type of issue. We should certainly see a message from the smb resource when rgmanager attempts to start it, and the fact that we don't indicates its not getting started for some reason.
A couple questions:
* Does "cman_tool status" report the same Config Version that is listed in /etc/cluster/cluster.conf as config_version?
* Are you seeing any warnings or messages from rgmanager/clurgmgrd in /var/log/messages when the daemon first starts?
* Have you made any modifications to any of the resource agents in /usr/share/cluster? We don't recommend doing so, but I ask because I've seen instances in the past where syntax errors in these agents have caused behavior similar to this.
If you would like to try to get some more verbose output while starting the service, you can try to start it with rg_test. Note: you need to disable your service before using rg_test, since it bypasses status checks and the check to determine whether its running on other nodes. If you'd like to give this a try and provide us the output, here is the procedure:
a) Stop the service using conga or with
# clusvcadm -d Cluster
b) Start the service using rg_test like so:
# rg_test test /etc/cluster/cluster.conf start service Cluster
Did that give any more information about why its not starting? If not, try:
c) Start the individual smb resource with rg_test like so:
# rg_test test /etc/cluster/cluster.conf start smb Samba
Feel free to provide the output here. Once you have completed this test, you'll want to shut everything down again so that it can once again be controlled by the cluster:
# rg_test test /etc/cluster/cluster.conf stop service Cluster
You're now free to start it back up with Conga or clusvcadm.
Hopefully these steps will shed some light on the problem. Let us know if you have any questions.
Regards,
John Ruemker, RHCA
Red Hat Technical Account Manager
Online User Groups Moderator
P.S. I should note that running a cluster without a valid fence device is unsupported. That said, it sounds like you're just giving the product a trial, so for testing purposes that is fine. However if you ever move this cluster into production, you'll want to investigate getting a proper fence device set up (which doesn't exist for VMWare guests).
If you are seeing cman_tool report a Config Version different than what is in cluster.conf, then its possible the version in use at the moment did not have the reference to the smb resource, and thus rgmanager is not attempting to start it when you start the service. Usually this version mismatch happens when you manually update cluster.conf, but don't apply/propagate those changes to the cluster.
The following article describes the procedure for apply changes to cluster.conf to the cluster in RHEL 5:
https://access.redhat.com/kb/docs/DOC-5951
As you can see ,you'll need to use ccs_tool update before cman_tool version.
As far as the validation error goes, I can't figure out what the problem is. Go ahead and try the above procedure to update it using ccs_tool update, and let us know if you still get this error.
Thanks,
John Ruemker, RHCA
Red Hat Technical Account Manager
Online User Groups Moderator
I think I mistakenly assumed you were talking about RHEL 5, whereas now I see you are on RHEL 6 (at least I think so). In that case, you were right and you only need to run 'cman_tool version -r'. The command I gave you is not valid in RHEL 6.
With that said, this revelation has lead me to the source of your issue (I believe). Your cluster.conf has this resource:
<smb name="Samba" workgroup="DOMAIN.LOCAL"/>
smb is not a valid resource type in RHEL 6. Instead it is now called "samba". Try replacing the references to smb with samba, and then do 'cman_tool version -r', and then see if the resource starts properly.
Regards,
John Ruemker, RHCA
Red Hat Technical Account Manager
Online User Groups Moderator
Conga is supported on RHEL 6. I'm not sure why the resource wouldn't be showing up. I'll see if I can take a look at that code today and find out why its failing.
In the meantime you should be able to continue using Conga to manage the service, but you just can't edit the service (since it doesn't recognize samba, it will remove that resource any time you edit it).
I'll let you know what I find. If anyone else here has ideas on why this is failing, feel free to chime in.
Regards,
John Ruemker, RHCA
Red Hat Technial Account Manager
Online User Groups Moderator