Diagnostics tools for OpenShift Enterprise systems administrators
An OpenShift Enterprise installation includes many options and variables, and therefore many opportunities for mistakes. To minimize the time spent on troubleshooting, we have developed tools for detecting and diagnosing many common problems and mis-configurations in an OpenShift Enterprise installation.
oo-diagnostics
The first tool to turn to when you suspect system problems is called oo-diagnostics. As the name suggests, it runs a series of tests to detect and diagnose problems; you can run it on any host in your installation to generate indicators of problems and (if known) how to address them. It does not take any action on its own. It is intended for an administrator or support engineer to view and react to the test results.
Example output
[root@node1 ~]# ./oo-diagnostics
WARN: test_vhost_servernames
The VirtualHost defined in /etc/httpd/conf.d/ssl.conf has the ServerName
vm-186-69-3-10.oseref.redhat.com and will respond with a 404 to all requests at
https://vm-186-69-3-10.oseref.redhat.com/
Please remove it by running this command:
sed -i '/VirtualHost/,/VirtualHost/ d' /etc/httpd/conf.d/ssl.conf
FAIL: test_broken_httpd_version
httpd-2.2.22-14.ep6.el6 is installed. This version includes serious known issues that
impact OpenShift operations. Please upgrade or downgrade httpd accordingly.
For details see: https://bugzilla.redhat.com/show_bug.cgi?id=893884
FAIL: run_script
oo-accept-node had errors:
--BEGIN OUTPUT--
FAIL: /etc/openshift/node.conf: PUBLIC_HOSTNAME localhost.localdomain should be public, not localhost
FAIL: /etc/openshift/node.conf: PUBLIC_HOSTNAME localhost.localdomain resolves to 127.0.0.1; expected 192.168.59.166|10.4.59.186
2 ERRORS
--END oo-accept-node OUTPUT--
1 WARNINGS
2 ERRORS
oo-diagnostics has several options (add "-h" to see usage) and also adapts its testing to the system that it finds.
Usage and disclaimer
It is important to understand the mandate of oo-diagnostics.
- Guide administrators and support engineers in useful directions for troubleshooting common problems
- Detect and diagnose common conditions that administrators encounter, due to mistakes or unexpected install conditions, especially those whose effects result in distant problems that are hard to diagnose
- Be useful for multiple versions of OpenShift (including Origin and Enterprise) and either broker or node hosts
We advise running this tool after every host installation (whether problems are apparent or not), at every reboot, before and after every upgrade (update oo-diagnostics first), and of course whenever problems are apparent.
It is also important to understand that due to its nature, oo-diagnostics cannot always diagnose perfectly. False negatives and positives are always possible - it is aggressive about detecting problems, and may issue warnings you can ignore, but it also cannot find every possible problem. Most tests are added in response to problems encountered in actual installations, so it can be an extremely helpful guide, but be sure that you understand its output before either acting on it or dismissing it. Bug reports and pull requests are very welcome, especially those that help clarify diagnostic output.
How to download
oo-diagnostics is included in OpenShift Enterprise 1.2 and should be available on any broker or node host.
Prior to version 1.2, the tool was available (unsupported) from the OpenShift Extras codebase in GitHub.
For OpenShift Enterprise 1.1:
https://github.com/openshift/openshift-extras/blob/enterprise-1.1/admin/oo-diagnostics
(Click on the "raw" tab for a link to download the script. Be sure to read the header comments.)
For OpenShift Enterprise 1.0:
https://github.com/openshift/openshift-extras/blob/enterprise-1.0/admin/oo-diagnostics
"Accept" scripts
These scripts are distributed with the product and are intended to be run as inputs to a monitoring solution; in other words, they should be very reliable indicators of core functionality (actually, the name refers to their use in continuous integration; the host/system needs to be "accepted" as a prerequisite for valid testing).
On a broker host, oo-accept-broker focuses primarily on the broker functionality itself, while oo-accept-systems (note: introduced in OpenShift Enterprise 1.1) checks consistency between the broker host and all available node hosts.
These are installed as part of the openshift-origin-broker-util RPM on a broker host. You can simply run them at the command line (add "-h" to view usage):
[root@broker ~]# oo-accept-broker -h
[root@broker ~]# oo-accept-systems -h
For node hosts, oo-accept-node is installed as part of the openshift-origin-node-util RPM. You can simply run it at the command line (add "-h" to view usage):
[root@node1 ~]# oo-accept-node -h
More information about using these scripts can be found in the Administrative Guide in OpenShift Enterprise documentation. oo-diagnostics consults these scripts (if available) as part of its testing, so they need not be run separately during troubleshooting. It is a good idea to run them regularly as part of a monitoring solution (oo-diagnostics is not as suitable, being intended for human consumption and evaluation).
