fs.sh processes are causing an excessively high load on my RHEL 5 or 6 High Availabilty cluster nodes
Environment
- Red Hat Enterprise Linux 5 (with High Availability or Resilient Storage Add Ons)
- One or more
fs
resources
- One or more
- Red Hat Enterprise Linux 6 (with High Availability or Resilient Storage Add Ons)
- One or more
fs
,clusterfs
, ornetfs
, resources
- One or more
Issue
- Why is
rgmanager
orclurgmgrd
issuing so much I/O - My cluster nodes are unresponsive or responding poorly
ps
putput showsfs.sh
processes running at almost 100% cpu- cluster is using high IO for the
clurgmgrd
daemon
Resolution
- Set the
quick_status="1"
option on each of the file-system related resources. For example:
<fs name="data1" device="/dev/vg/data1" mountpoint="/data1" fsid="1234" force_unmount="1" self_fence="1" quick_status="1">
Root Cause
In a High Availability cluster, the file-system related resources (fs
, clusterfs
, netfs
) all perform multiple levels of status-checks at different intervals. The more-frequent status checks do a basic test to see if the file system is mounted, whereas higher levels at less frequent intervals will do a read test (ls
against the mountpoint) and/or a write test (touch
a temporary file in the mountpoint). When there are a large number of these types of resources in cluster services, then rgmanager
may be issuing a larger amount of I/O to managed devices than expected.
In RHEL 5's fs
resource and RHEL 6's fs
, clusterfs
, and netfs
resources, there exists an option called quick_status
which will bypass the read and write tests on all status checks and simply do a mount test. The description from the resource at /usr/share/cluster/fs.sh
is as follows:
<parameter name="quick_status">
<longdesc lang="en">
Use quick status checks. When set to 0 (the default), this
agent behaves normally. When set to 1, this agent will not
log errors incurred or perform the file system accessibility
check (e.g. it will not try to read from/write to the file
system). You should only set this to 1 if you have lots of
file systems on your cluster or you are seeing very high load
spikes as a direct result of this agent.
</longdesc>
<shortdesc lang="en">
Quick/brief status checks.
</shortdesc>
<content type="boolean"/>
</parameter>
Diagnostic Steps
- Check top output to see what processes are consuming all of the cpu resources:
top - 12:35:53 up 12 days, 5:09, 2 users, load average: 31.67, 31.17, 31.83
Tasks: 1724 total, 22 running, 1702 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.8%us, 39.3%sy, 0.0%ni, 58.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 264271100k total, 229803556k used, 34467544k free, 24150860k buffers
Swap: 12289716k total, 892660k used, 11397056k free, 192625392k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1179 root 24 -1 11076 1464 964 R 79.1 0.0 0:02.62 fs.sh
1204 root 24 -1 11076 1464 964 R 74.3 0.0 0:02.51 fs.sh
1253 root 23 -1 11076 1464 964 R 68.4 0.0 0:02.36 fs.sh
1255 root 24 -1 11076 1468 964 R 64.8 0.0 0:02.36 fs.sh
1284 root 24 -1 11076 1468 964 R 60.4 0.0 0:02.20 fs.sh
1285 root 24 -1 11076 1464 964 R 60.1 0.0 0:02.19 fs.sh
1313 root 21 -1 11076 1464 964 R 61.7 0.0 0:02.06 fs.sh
1376 root 22 -1 11076 1468 964 R 50.2 0.0 0:01.83 fs.sh
1488 root 20 -1 11076 1464 964 R 47.5 0.0 0:01.73 fs.sh
1523 root 22 -1 11076 1464 964 R 45.9 0.0 0:01.67 fs.sh
1545 root 24 -1 11076 1464 964 R 44.8 0.0 0:01.63 fs.sh
1565 root 20 -1 11076 1464 964 R 44.2 0.0 0:01.61 fs.sh
1654 root 19 -1 11076 1464 964 R 33.2 0.0 0:01.21 fs.sh
1701 root 20 -1 11076 1468 964 R 31.8 0.0 0:01.16 fs.sh
1703 oracle38 18 0 10480 904 740 R 25.0 0.0 0:00.91 ps
1715 root 19 -1 11076 1468 964 R 23.6 0.0 0:00.86 fs.sh
1716 root 19 -1 11076 1464 964 R 20.6 0.0 0:00.75 fs.sh
1719 root 18 -1 11076 1464 964 R 16.2 0.0 0:00.59 fs.sh
- Use grep to see how many filesystem resources are listed in the cluster.conf and how many are mounted:
# grep "fs device" /etc/cluster/cluster.conf | wc -l
163
# cat /proc/mounts | wc -l
79
- Bugzilla 250718 has been opened for this issue.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments