Why does disabling or enabling a clustered nfs service in RHEL 5 or 6 take a long time when there are a large number of IP addresses on the system?

Solution Unverified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 5 or 6 with the High Availability Add On
  • rgmanager
  • cluster service in /etc/cluster/cluster.conf containing nfsclient and fs/clusterfs resources

Issue

  • Why does disabling or enabling a clustered nfs service in RHEL 5 or 6 take a long time when there are a large number of IP addresses on the system?
  • Disabling and enabling NFS services on the same node was slow (~2.5 minutes a volume).

Resolution

The lengthy start/stop operations are expected when there are a large number of IP addresses on the system. If the time it takes to complete these operations must be reduced, then decreasing the number of IPs will help. For instance, if there are assigned IPs on the system that are not being used, then unconfigure them. Or if there are a large number of different services each with their own IP resource, try condensing them for the purposes of eliminating IPs.

Root Cause

When a cluster service containing NFS export/client resources fails over, there must be a mechanism to notify clients that they must reclaim their locks. Otherwise, when the service starts on the new node, those clients will assume locks are still in place when they actually are not.

rgmanager handles this failover of locks in the fs/clusterfs resource agents. When the file system resource is stopping, the agent takes a copy of /var/lib/nfs/statd/sm* and stores it on the file system. It then creates a temporary copy of those directories and uses rpc.statd to broadcast a notification to all clients that will cause them to reclaim their locks. In order to be sure that all clients have been notified, the agent must repeat this broadcast procedure over every available IP address on the host (obtained from 'ip -o addr list'). Due to complications in rpc.statd that can cause it to take a few seconds to complete its work, the resource agent in question has a 3-second sleep built-in while it waits for rpc.statd to finish. If there are a large number of IP addresses, this 3-second sleep can add up to a significant amount of time.

In clusters where there is a large number of NFS services, each with its own IP, this can result in several minutes of wait time when stopping.

When the fs/clusterfs resource starts back up with the service (possibly on another node), it will copy the backup copy of /var/lib/nfs/statd/sm* that it made on the fs resource back to its own local copy of that directory, so that statd will continue using them. This, like the stop procedure, can take some time and contribute to the relocation or failover taking longer than expected.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments