-
Language:
English
-
Language:
English
Troubleshooting Guide
Troubleshooting OpenShift Enterprise
Red Hat OpenShift Documentation Team
Abstract
- Configuration of standard Linux components and corresponding log files
- Configuration of OpenShift Enterprise components and corresponding log files
- Recognizing common system problems
- Error messages that may occur when creating applications
Chapter 1. Introduction to OpenShift Enterprise
1.1. What's New in Current Release
Chapter 2. Log Files and Validation Scripts
2.1. Configuration and Log Files for Standard Linux Components
2.1.1. General Information
/var/log/messages
file. This serves as a good starting point to investigate issues that might not be logged anywhere else.
/var/log/httpd/access_log
file shows whether your web request was received by the host.
/var/log/httpd/error_log
file can be helpful in troubleshooting certain problems on broker and node hosts.
/var/log/audit/audit.log
file is useful for finding problems that might be caused by SELinux violations.
/var/log/secure
file logs user and SSH interactions. Because users can SSH into their gears, and all Git requests also authenticate using SSH, this file is useful for checking interaction with gears on node hosts.
2.1.2. Networking
The best place for Linux operators to begin troubleshooting DNS problems on broker, node, or client hosts is the /etc/resolv.conf
file. On client hosts running other operating systems, look in the appropriate network configuration file.
/etc/resolv.conf
file as the first nameserver.
/etc/resolv.conf
file should point to your OpenShift Enterprise installation, either receiving updates from it, or delegating the domain to the nameserver of your installation.
# dig hostname
The application hostname is a CNAME for the node host DNS record. However, for a scaled application, this command will only show which node host contains the HAProxy gear; other gears could reside on different node hosts.
If you are running a BIND server on the broker (or supporting) host, the configuration information is contained in the /var/named/dynamic
directory. The zone file syntax is domain.com.db.zone
; so if the domain of your OpenShift Enterprise installation is example.com, the zone file name would be example.com.db.zone
. However, not all changes will be in the zone file. Recent changes can be contained in a binary journal file.
# dig domain axfr
For broker and node hosts, DHCP is currently only supported if the host IPs are pinned, meaning they do not change during lease renewal. This also applies to nameservers, in that they should also not change if pinned.
/etc/dhcp/dhclient-network-interface.conf
file to verify the nameservers provided by the DHCP service are being overwritten when a new lease is obtained.
/etc/resolv.conf
file is overwritten with incorrect values, check your configuration in the dhclient-network-interface.conf
file.
2.1.3. SELinux
Procedure 2.1. To Troubleshoot SELinux Issues:
- As root, run the following command to set SELinux to permissive mode:
#
setenforce 0
- Retry the failing action. If the action succeeds then the issue is SELinux related.
- Run the following command to set SELinux back to enforcing mode:
#
setenforce 1
- Check the
/var/log/audit/audit.log
file for any SELinux violations.
2.1.4. Control Groups on Node Hosts
cgconfig
service is running correctly on a node host, you see the following:
- The
/etc/cgconfig.conf
file exists with the SELinux label:system_u:object_r:cgconfig_etc_t:s0
. - The
/etc/cgconfig.conf
file joins CPU, cpuacct, memory, freezer, and net_cls in the/croup/all
directory. - The
/cgroup
directory exists, with the SELinux label:system_u:object_r:cgroup_t:s0
. - The cgconfig service is running.
- The
/etc/cgrules.conf
file exists with the SELinux label:system_u:object_r:cgrules_etc_t:s0
- The cgred service is running.
- A line for each gear in the
/etc/cgrules.conf
file. - A directory for each gear in the
/cgroup/all/openshift
directory. - All processes with the gear UUID are listed in the gear's
cgroup.procs
file. This file is located in the/cgroup/all/openshift/gear_UUID
directory.
Important
unconfined_u
and not system_u
. For example, the SELinux label in /etc/cgconfig.conf
would be unconfined_u:object_r:cgconfig_etc_t:s0
.
2.1.5. Pluggable Authentication Modules
nproc
value to control the number of processes a given account can create.
/etc/openshift/resource_limits.conf
file on the node host:
limits_nproc=2048
84-gear_UUID.conf
file is created on the node host, in the /etc/security/limits.d
directory. Replace gear_UUID with the UNIX account name for the gear. This contains a rule set that defines the limits for that UNIX account. The first field of each line in the file is the gear UUID.
nproc
limit for an individual gear is increased by changing the value in the 84-gear_UUID.conf
file:
# PAM process limits for guest
# see limits.conf(5) for details
#Each line describes a limit for a user in the form:
#
#<domain> <type> <item> <value>
32ec916eeaa04032b1481af5037a6dfb hard nproc 250
nproc
limit.
2.1.6. Disk Quotas
/var/lib/openshift
directory has the usrquota
option enabled in the /etc/fstab
file, and has been mounted. Remount the directory if necessary using the command shown below, and check the output.
# mount -o remount filesystem
# repquota -a
2.1.7. iptables
# iptables -L
iptables -L
command for both a broker host and a node host are shown below.
Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED ACCEPT icmp -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:ssh ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:domain ACCEPT udp -- anywhere anywhere state NEW udp dpt:domain ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:https ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:http ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:61613 REJECT all -- anywhere anywhere reject-with icmp-host-prohibited Chain FORWARD (policy ACCEPT) target prot opt source destination REJECT all -- anywhere anywhere reject-with icmp-host-prohibited Chain OUTPUT (policy ACCEPT) target prot opt source destination
Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED ACCEPT icmp -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:ssh ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:https ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:http ACCEPT tcp -- anywhere anywhere state NEW tcp dpts:35531:65535 REJECT all -- anywhere anywhere reject-with icmp-host-prohibited Chain FORWARD (policy ACCEPT) target prot opt source destination REJECT all -- anywhere anywhere reject-with icmp-host-prohibited Chain OUTPUT (policy ACCEPT) target prot opt source destination
2.2. Configuration and Log Files for OpenShift Components
2.2.1. General Configuration
/etc/openshift
directory contains the most important configuration files for OpenShift Enterprise. These configuration files correspond to the type of installation; for example, a broker host, node host, or a client host. Check the corresponding configuration file to verify that the settings are suitable for your system.
2.2.2. Broker Host Failures
/var/log/openshift/broker/httpd/
directory, check the access_log
and error_log
files when user interactions with the broker host are failing. Verify that the request was authenticated and forwarded to the broker application.
/var/log/openshift/broker/production.log
file.
/var/log/openshift/broker/user_action.log
file. This log file includes gears created and deleted by a user. However, the logs do not include gear UUIDs.
2.2.3. MCollective
# oo-mco ping
broker.mydomain.com time=134.85 ms
node.mydomain.com time=541.41 ms
node1.mydomain.com time=572.76 ms
---- ping statistics ----
3 replies max: 572.76 min: 134.85 avg: 416.34
All configured node hosts should be represented in the output. If you do not see a node host as expected, verify that the network and clock settings are configured correctly for that node host.
Note
oo-mco ping
command is not running successfully, it could be that openshift-origin-util-scl is not properly installed on your machine, or that oo-mco ping
is missing. Install the openshift-origin-util-scl package in order to run the command.
/var/log/openshift/node/ruby193-mcollective.log
on node hosts/var/log/openshift/broker/ruby193-mcollective-client.log
on broker hosts
/var/log/openshift/node/platform.log
and /var/log/openshift/node/platform-trace.log.
dig
or host
command, with the application's hostname.
2.2.4. Gears
/var/lib/openshift
directory on that gear's node host, and represented by the gear's UUID. This directory contains the following information:
- Gears themselves
- Web server configuration
- Operation directories
ls
command to show the contents of the /var/lib/openshift/.httpd.d
directory.
# ls /var/lib/openshift/.httpd.d/
aliases.db frontend-mod-rewrite-https-template.erb idler.db nodes.db routes.json sts.txt
aliases.txt geardb.json idler.txt nodes.txt sts.db
/etc/passwd
file.
2.3. Validation Scripts
2.3.1. Broker Host Scripts
2.3.1.1. Verifying Broker Host Configuration
oo-accept-broker
script without any options to report potential problems in the broker host configuration. The output from this script indicates how many problems are found.
2.3.1.2. Fixing Gear Discrepancies
oo-admin-chk
script without any options to compare gear records in the broker's Mongo datastore to the gears actually present on the node hosts. The script reports any discrepancies that are found.
Example 2.1. Diagnosing Problems Using oo-admin-chk
# oo-admin-chk
Check failed.
FAIL - user user@domain.com has a mismatch in consumed gears (-1) and actual gears (0)!
This indicates a mismatch between the number of consumed gears and the number of actual gears, which can occur under certain race conditions.
oo-admin-ctl-user
command:
# oo-admin-ctl-user -l user@domain.com --setconsumedgears 0
Example 2.2. Diagnosing Problems Using oo-admin-chk
# oo-admin-chk
Gear 9bb07b76dca44c3b939c9042ecf1e2fe exists on node [node1.example.com, uid:2828] but does not exist in mongo database
This output indicates that although a gear was destroyed from the broker host's MongoDB, it was not completely removed from the node host. This can be due to storage issues or other unexpected failures. You can fix this issue by deleting the gear from the /var/lib/openshift
directory, and removing the user from the node host.
oo-admin-chk
script. The script should be self-explanatory to resolve most problems.
2.3.2. Node Host Scripts
2.3.2.1. Verifying Node Host Configuration
oo-accept-node
script without any options to report potential problems in the node host configuration. The output from this script indicates how many problems are found.
2.3.3. Additional Diagnostics
oo-diagnostics
script can be run on any OpenShift Enterprise host to diagnose common problems and provide potential solutions. It can also be helpful for gathering information (particularly when run with the -v
option for verbose output) to provide to Red Hat Support when opening a support case.
Chapter 3. Recognizing System Problems
3.1. Missing Repositories
Table 3.1. List of Repositories
Name of Repository | Description |
---|---|
Red Hat OpenShift Enterprise Infrastructure | Broker / BIND / Mongo hosts |
Red Hat OpenShift Enterprise Application Node | Node hosts |
Red Hat OpenShift Enterprise Client Tools | Client hosts |
Red Hat OpenShift Enterprise JBoss EAP add-on | Included with EAP support purchase. See note below. |
Red Hat OpenShift Enterprise Application Platform | Included with EAP support purchase. See note below. |
Red Hat OpenShift Enterprise Web Server | Included with bundle purchase. See note below. |
Note
Important
3.2. Missing Node Host
oo-mco ping
command on the broker host, all node hosts should be listed in the output. Although applications on an unlisted node host can continue to operate without problems, the unlisted node hosts are not controlled by the broker host.
oo-mco ping
command if the clock on the broker host is not synchronized with the clock on the node host. MCollective messages have a TTL of 60 seconds. Therefore, if the clocks are not synchronized the MCollective messages can be dropped, causing communication issues. Verify that the broker host and node host clocks are synchronized, and the ntpd service is enabled. All configured hosts must use the same NTP server.
/var/log/openshift/node/ruby193-mcollective.log
file on the node host, and could look like the following sample screen output:
W, [2012-08-10T14:27:01.526544 #12179] WARN -- : runner.rb:62:in `run' Message 8beea9354f9784de939ec5693940d5ce from uid=48@broker.example.com created at 1344622854 is 367 seconds old, TTL is 60
oo-mco ping
command if ActiveMQ on the broker host cannot communicate with MCollective on the node host. Verify that the ruby193-mcollective
service is running on the node host, and it can communicate with ActiveMQ on the broker host. If a configuration has been modified recently, use the following command to restart the ruby193-mcollective
service:
# service ruby193-mcollective restart
3.3. Broker Application Response Failure
Passenger
service. There can be cases when the broker host service appears to be running, but in reality is not. If the Passenger
service fails to start for some reason, the broker host service will not start, even if the httpd
service is running. So even though the service openshift-broker start
command reports success, the service may not actually be running.
Passenger
service are logged in the /var/www/openshift/broker/httpd/logs/error_log
file on the broker host, as shown in the following screen output:
[Wed Oct 17 23:48:04 2012] [error] *** Passenger could not be initialized because of this error: Unable to start the Phusion Passenger watchdog (/usr/lib64/gems/exts/passenger-3.0.17/agents/PassengerWatchdog): Permission denied (13)
Passenger
service has failed to start. This can be caused by dependency issues with the RubyGems
package, which often occurs when Bundler
attempts to regenerate the /var/www/openshift/broker/Gemfile.lock
file.
#This shows that the specified dependency was not found. Updating all Ruby gems, and restarting thecd /var/www/openshift/broker/
#bundle --local
Could not find rack-1.3.0 in any of the sources
openshift-broker
service could resolve this issue.
3.3.1. Missing Gems with Validation Scripts
Bundler
and RubyGems
dependencies. This is because the validation scripts, such as oo-admin-chk
, use the broker Rails configuration and also depend on the /var/www/openshift/broker/Gemfile.lock
file, as shown in the following sample output:
# oo-admin-chk
Could not find rack-1.3.0 in any of the sources
Run `bundle install` to install missing gems.
openshift-broker
service will regenerate the Gemfile.lock
file, and could solve this issue. Be sure to run the yum update
command before restarting the openshift-broker
service.
Warning
bundle install
command as the output asks you to do. Running this command will download and install unsupported and untested software packages, resulting in problems with your OpenShift Enterprise installation.
3.4. DNS Propagation Fails when Creating an Application
$ rhc app create -a myapp -t jbossas-7
Creating application: myapp in domain
Now your new domain name is being propagated (this might take a minute)...
........
retry # 3 - Waiting for DNS: myapp-domain.example.com
Eventually the process will timeout while attempting to resolve the application's hostname.
PUBLIC_HOSTNAME
setting in the /etc/openshift/node.conf
file on the node host is incorrectly configured.
Note
oo-admin-chk
script on the broker host can help detect this problem.
3.5. Developers Connecting to a Gear are Disconnected Immediately
git clone
is performed, a developer can authenticate successfully, but then be disconnected by the remote host. This could be due to PAM being misconfigured. An example of this error is shown in the output below.
$ rhc app create -n apps -t php -a testing
Creating application 'testing'
==============================
Scaling: no
Gear Size: default
Cartridge: php
Namespace: apps
Password: ********
Your application's domain name is being propagated worldwide (this might take a minute)...
The authenticity of host 'testing-apps.example.com (x.x.x.x)' can't be established.
RSA key fingerprint is [...].
Are you sure you want to continue connecting (yes/no)? yes
Initialized empty Git repository in /home/test/testing/.git/
done
Error in git clone - Warning: Permanently added 'testing-apps.example.com' (RSA) to the list of known hosts.
Traceback (most recent call last):
File "/usr/bin/oo-trap-user", line 134, in <module>
read_env_vars()
File "/usr/bin/oo-trap-user", line 64, in read_env_vars
fp = open(os.path.expanduser('~/.env/') + env, 'r')
IOError: [Errno 13] Permission denied: '/var/lib/openshift/a7a330ee62ae467ca6d74cd0ce29742a/.env/OPENSHIFT_APP_NAME'
fatal: The remote end hung up unexpectedly
pam_selinux
should be changed to pam_openshift
in /etc/pam.d/sshd
, and a line with pam_namespace.so
should be at the end of each file modified. If your change management system overwrote these settings, ensure that your system will retain the correctly modified files in the future.
3.6. Gears Not Idling
oddjob
daemon must be running on node hosts for gear idling to work correctly. Error messages for gear idling issues are logged in the /var/log/httpd/error_log
file on the node host. The following error message, from the error-log
file, shows that the oddjob
daemon is not running.
org.freedesktop.DBus.Error.ServiceUnknown: The name com.redhat.oddjob_openshift was not provided by any .service filesUse the following commands to start the
oddjob
daemon, and enable it to start at boot:
#service oddjobd start
#chkconfig oddjobd on
3.7. cgconfig Service Fails to Start
cgconfig
service fails to start, look for AVC messages in the /var/log/audit/audit.log
SELinux audit log file. The error messages could indicate incorrect SELinux labels in the following files and directories:
/etc/cgconfig.conf
/etc/cgrules.conf
/cgroup
restorecon -v filename
command to restore the correct SELinux labels for each of the files:
#This restores the correct SELinux labels in therestorecon -v /etc/cgconfig.conf
#restorecon -v /etc/cgrules.conf
#restorecon -rv /cgroup
/etc/cgrules.conf
file.
cgconfig
service using the following command:
# service cgconfig start
3.8. MongoDB Failures
# service mongod status
mongod
service is not running, look in the /var/log/mongodb/mongodb.log
file for information. Look for duplicate configuration lines, which cause problems with MongoDB, and result in the multiple_occurences error message. Verify that there are no duplicate configuration lines in the /etc/mongodb.conf
file to enable the mongod
service to start.
/etc/openshift/broker.conf
file for MongoDB configuration details such as database host, port, name, user, and password.
Example 3.1. Example MongoDB Configuration
MONGO_HOST_PORT="localhost:27017" MONGO_USER="mongouser" MONGO_PASSWORD="mongopassword" MONGO_DB="openshift_broker" MONGO_SSL="false"
mongod
service running, use the following command to connect to the database, replacing configuration settings accordingly:
# mongo localhost:27017/openshift_broker -u mongouser -p mongopassword
The MongoDB command prompt is displayed.
3.9. Jenkins Build Failures
AUTH_SALT
setting is changed in the /etc/openshift/broker.conf
file, subsequent Jenkins builds will initially fail with the following:
remote: Executing Jenkins build. remote: remote: You can track your build at https://jenkins-namespace.example.com/job/myapp-build remote: remote: Waiting for build to schedule........................... remote: **BUILD FAILED/CANCELLED** remote: Please see the Jenkins log for more details via rhc-tail-files remote: !!!!!!!! remote: Deployment Halted! remote: If the build failed before the deploy step, your previous remote: build is still running. Otherwise, your application may be remote: partially deployed or inaccessible. remote: Fix the build and try again. remote: !!!!!!!!Checking the Jenkins application's logs will reveal the following invalid credential messages:
# rhc tail jenkins
...
WARNING: Caught com.openshift.client.InvalidCredentialsOpenShiftException: Your credentials are not authorized to access "https://broker.example.com/broker/rest/user". Will retry 4 more times before canceling build.
com.openshift.client.InvalidCredentialsOpenShiftException: Your credentials are not authorized to access "https://broker.example.com/broker/rest/user"
...
To address these issues, first restart the broker service:
# service openshift-broker restart
Then run the oo-admin-broker-auth
tool to rekey the gears' authorization tokens. To rekey the tokens for all applicable gears, run the tool with the --rekey-all
option:
# oo-admin-broker-auth --rekey-all
See the command's --help
output and man page for additional options and more detailed use cases.
3.10. Outdated Cartridge List
# oo-admin-broker-cache --clear
--console
option implies --clear
):
# oo-admin-broker-cache --console
Chapter 4. Error Messages when Creating Applications
4.1. cpu.cfs_quota_us: No such file
rhc app create
command can fail to create an application if cgroups are not working properly. These error messages are logged in the /var/log/openshift/node/ruby193-mcollective.log
file on the node host, and can look like the following:
/cgroup/all/openshift/*/cpu.cfs_quota_us: No such file
4.2. Password Prompt
PUBLIC_HOSTNAME
must be configured correctly in the /etc/openshift/node.conf
file on the node host.
The authenticity of host 'myapp-domain.example.com (::1)' can't be established. RSA key fingerprint is 88:49:43:d2:e9:b4:4d:84:a1:d6:8a:30:85:73:d7:7f. Are you sure you want to continue connecting (yes/no)? yes e9bdfc309bef4c13889a21ddbea45f@myapp-domain.example.com's password:
PUBLIC_HOSTNAME
resolves to the wrong IP address. In this case, PUBLIC_HOSTNAME
is set to localhost.localdomain
, as shown in the sample screen output below.
PUBLIC_HOSTNAME=localhost.localdomainIn this example the application's gear CNAME is created using
localhost.localdomain
as the hostname for the node host. When Git attempts to authenticate using the gear user ID and SSH key, the SSH authentication fails because the application gear does not exist on localhost.localdomain
, and you are prompted for a password.
(::1)
, which is pointing to localhost, and is not a valid IP for an application's gear. Verify that the IP address of an application's gear is a valid IP address of the node host.
PUBLIC_HOSTNAME
fails to resolve at all as a FQDN, DNS resolution times out and the Git clone process fails.
Note
oo-admin-chk
script on the broker host can help detect this problem.
4.3. Communication Issue after Node Host Reboot
rhc app create
command can fail to create an application, resulting in the following error:
An error occurred while communicating with the server. This problem may only be temporary. Check that you have correctly specified your OpenShift server 'https://broker.example.com/broker/rest/domain/domain-name/applications'.
oo-mco
commands on the broker host may continue to find the rebooted node host without any issues:
# oo-mco ping
node.example.com time=170.19 ms
---- ping statistics ----
1 replies max: 170.19 min: 170.19 avg: 170.19
activemq
service:
# service activemq restart
Note
Chapter 5. Debugging Problems with Specific Applications
5.1. Common Resources
/etc/passwd
file for information unique to that particular gear. You will see an account for the gear, represented with the gear's UUID. This file also provides the path to the login shell for the application's gear. The following sample screen output shows how gears are represented in the /etc/passwd
file.
........ haproxy:x:188:188::/var/lib/haproxy:/sbin/nologin postgres:x:26:26:PostgreSQL Server:/var/lib/pgsql:/bin/bash mysql:x:27:27:MySQL Server:/var/lib/mysql:/bin/bash jenkins:x:498:498:Jenkins Continuous Build server:/var/lib/jenkins:/bin/false def4330dff68444b96846dd225a0a617:x:500:500:OpenShift guest:/var/lib/openshift/def4330dff68444b96846dd225a0a617:/usr/bin/oo-trap-user c9279521cffd4a5ba1118f1b6ac2d6d6:x:500:500:OpenShift guest:/var/lib/openshift/def4330dff68444b96846dd225a0a617:/usr/bin/oo-trap-user e16a4a4c2c1144c3815f19ba36ea9d32:x:500:500:OpenShift guest:/var/lib/openshift/def4330dff68444b96846dd225a0a617:/usr/bin/oo-trap-user
- The
/var/lib/openshift/gear_UUID
directory on the node host is the home directory for each application gear. Check the SELinux contexts. - The
/var/lib/openshift/.httpd.d/gear_UUID*
directory on the node host is the operations directory for each application gear. It contains thehttpd
configuration for that particular application gear. - The
/var/log
directory on the node host contains theruby193-mcollective.log
file. - Searching the
/var/log/openshift
directory on the node host for the gear's user UUID usinggrep
could help you find problems with application gears that generate error messages. - The
/var/log/openshift/user_action.log
file on the broker host contains logs of user actions.
5.2. Rails Applications
root
as some OpenShift API calls are cached under /var/www/openshift/broker/tmp/cache
and are owned by the user who runs the console. When the cache expires, the broker attempts to invalidate the cache. Since the broker runs as the apache
user it is unable to clear the root-owned files and returns 500 errors.
apache
user:
#su --shell=/bin/bash -l apache
$cd /var/www/openshift/console
$./script/rails console production
Chapter 6. Technical Support
6.1. Reporting Bugs
6.2. Getting Help
sos
RPM, and use the following command to create an archive of relevant host information to include with your support request.
# sosreport
6.3. Participating in Development
Appendix A. Revision History
Revision History | ||||||
---|---|---|---|---|---|---|
Revision 2.1-1 | Wed Jun 11 2014 | Bilhar Aulakh | ||||
| ||||||
Revision 2.1-0 | Fri May 16 2014 | Julie Wu | ||||
| ||||||
Revision 2.0-1 | Tue Jan 14 2014 | Brice Fallon-Freeman | ||||
| ||||||
Revision 2.0-0 | Tue Dec 10 2013 | Bilhar Aulakh | ||||
|