How to generate a sosreport within nodes without SSH in OCP 4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Red Hat Enterprise Linux CoreOS (RHCOS)
- sosreport
Issue
- What is the recommended way for generating a sosreport in Red Hat OpenShift Container Platform?
- It may not be possible to connect to OpenShift 4 nodes via SSH from outside the cluster by default but
sosreport
(or other machine binaries) may need to be run for troubleshooting purposes.
Resolution
By design, OpenShift 4 nodes are immutable and rely on ClusterOperators to apply the changes. In turn, this means that accessing the underlying nodes directly by SSH is not the recommended procedure. Additionally, the nodes will be tainted as accessed.
Note: This solution relies on command
oc debug node/<node_name>
. Under specific circumstances, this command may fail, for example, ifkubelet
is not properly running on the targetnode
, in that case, consider other available options within section Other ways to generate a sosreport in OpenShift 4.
Generating a sosreport with oc debug node
command
The following example shows how to debug node "node-1
":
-
First, display the list of nodes in the cluster:
$ oc get nodes NAME STATUS ROLES AGE VERSION node-1 Ready master 119d v1.14.6+8e46c0036 node-2 Ready worker 119d v1.14.6+8e46c0036 [...]
-
Then, create a debug session with
oc debug node/<node name>
(in this exampleoc debug node/node-1
). The debug session will spawn a pod using thetools
image from the release (which doesn't containsos
):$ oc debug node/node-1 Starting pod/node-1-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.0.11 If you don't see a command prompt, try pressing enter. sh-4.4# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.7 (Maipo) sh-4.4#
-
Once in the debug session, one can use
chroot
to change the apparent root directory to the one of the underlying host:sh-4.4# chroot /host bash [root@node /]# cat /etc/redhat-release Red Hat Enterprise Linux CoreOS release 4.12 [root@node /]#
Note: in disconnected environments, it is needed to have the
registry.redhat.io/rhel9/support-tools
mirrored. If the image is already available for the nodes, create a/root/.toolboxrc
file within the node as follows before runningtoolbox
(change theREGISTRY
var with the URL of the registry, and theIMAGE
name in the custom registry):[root@node ~]# vi /root/.toolboxrc REGISTRY=[custom-private-registry.example.com:5000] IMAGE=rhel9/support-tools
-
Apply any Proxy variables to the current session, if applicable:
$ export HTTP_PROXY=http://<username>:<pswd>@<ip>:<port> $ export HTTPS_PROXY=http://<username>:<pswd>@<ip>:<port>
-
Now, run
toolbox
command to run a special container with all necessary binaries:[root@node /]# toolbox Trying to pull registry.redhat.io/rhel9/support-tools...Getting image source signatures Copying blob fd8daf2668d1 done Copying blob 1457434f891b done Copying blob cb3c77f9bdd8 done Copying config 517597590f done Writing manifest to image destination Storing signatures 517597590ff4236b0e5e3efce75d88b2b238c19a58903f59a018fc4a40cd6cce Spawning a container 'toolbox-' with image 'registry.redhat.io/rhel9/support-tools' Detected RUN label in the container image. Using that as the default... command: podman run -it --name toolbox- --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=toolbox- -e IMAGE=registry.redhat.io/rhel9/support-tools:latest -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host registry.redhat.io/rhel9/support-tools:latest [root@node /]#
Note: If running toolbox yields the message
Container 'toolbox-' already exists. Trying to start...
, it is strongly recommended to remove the running toolbox container withpodman rm 'toolbox-'
. This will ensure that a new instance of the toolbox container is spawned which in turn will avoid issues withsosreport
plugins. -
Again, apply any proxy variables to the current session, if applicable, because this is a different shell session:
[root@node /]# export HTTP_PROXY=http://<username>:<pswd>@<ip>:<port> [root@node /]# export HTTPS_PROXY=http://<username>:<pswd>@<ip>:<port>
- Finally, proceed with
sos report
(remove--all-logs
parameter if the generated sosreport is too big):
[root@node /]# sosreport -e openshift -k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on --all-logs
sosreport (version 4.5.1)
This command will collect diagnostic and configuration information from
this Red Hat CoreOS system.
An archive containing the collected information will be generated in
/host/var/tmp/sos.idipawos and may be provided to a Red Hat support
representative.
Any information provided to Red Hat will be treated in accordance with
the published support policies at:
Distribution Website : https://www.redhat.com/
Commercial Support : https://www.access.redhat.com/
The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.
No changes will be made to system configuration.
Press ENTER to continue, or CTRL-C to quit.
Optionally, please enter the case id that you are generating this report for []: 01234567
Setting up archive ...
Setting up plugins ...
[...]
Running plugins. Please wait ...
Finishing plugins [Running: networking] mon]
Finished running plugins
Creating compressed archive...
Your sosreport has been generated and saved in:
/host/var/tmp/sosreport-node-1.tar.xz
Size 26.23MiB
Owner root
sha256 64dc2efa6f25c16f1bae9d596f291d899b875a16e0a945bc973387a3fb84382d
Please send this file to your support representative.
[root@node /]#
Note: if any of the plugins times out, or not all the information is collected, it could be needed to add the paramenter
--plugin-timeout=600
to increase the plugin timeout.
What options are available to copy/share the generated sosreport?
Refer to How to provide an sosreport from a RHEL CoreOS OpenShift 4 node.
Other ways to generate a sosreport in OpenShift 4
-
It is possible to log into the node directly via SSH and take a
sosreport
. Check How to generate a sos report in Red Hat Enterprise Linux CoreOS in OpenShift 4 with SSH access to nodes for the instructions. -
If accessing to the nodes via SSH from outside the cluster is not possible, launch
oc debug node/<node_name>
against a different workingnode
, create a file with the same private key used for the installation and SSH into the failingnode
after that, for example:$ oc debug node/node-2 Starting pod/node-2-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.0.12 If you don't see a command prompt, try pressing enter. sh-4.4# vim key sh-4.4# chmod 400 key sh-4.4# ssh -i key -l core node-3 Red Hat Enterprise Linux CoreOS 43.81.202003191953.0 Part of OpenShift 4.3, RHCOS is a Kubernetes native operating system managed by the Machine Config Operator (`clusteroperator/machine-config`). WARNING: Direct SSH access to machines is not recommended; instead, [...] [core@node-3 ~]$
Finally, if all suggestions fail, it is possible to use a simpler script version of sosreport: Sosreport fails. What data should I provide in its place?.
Root Cause
By design, OpenShift 4 nodes are immutable and rely on ClusterOperators to apply the changes.
Diagnostic Steps
-
How to check if your
nodes
where externally accessed byssh
:$ oc get nodes -o jsonpath='{range .items[*]}{.metadata.name}{" - "}{.metadata.annotations.machineconfiguration\.openshift\.io/ssh}{"\n"}{end}' node-1 - accessed node-2 - node-3 - accessed
-
In order to have
sosreport
available, it is need to usetoolbox
container but in some early versions it was failing to download the necessaryregistry.redhat.io/rhel8/support-tools
image (even if manually providing theregistry.redhat.io
credentials):sh-4.4# toolbox Trying to pull registry.redhat.io/rhel9/support-tools...Failed error pulling image "registry.redhat.io/rhel9/support-tools": unable to pull registry.redhat.io/rhel8/support-tools: unable to pull image: Error determining manifest MIME type for docker://registry.redhat.io/rhel8/support-tools:latest: unable to retrieve auth token: invalid username/password Would you like to authenticate to registry: 'registry.redhat.io' and try again? [y/N] y Authenticating with existing credentials... Existing credentials are invalid, please enter valid username and password Username: rhn-<username> Password: ****** Login Succeeded! Trying to pull registry.redhat.io/rhel8/support-tools...Failed error pulling image "registry.redhat.io/rhel8/support-tools": unable to pull registry.redhat.io/rhel8/support-tools: unable to pull image: Error determining manifest MIME type for docker://registry.redhat.io/rhel8/support-tools:latest: unable to retrieve auth token: invalid username/password
-
In order to solve this, it is possible to simply download the image first within the node as follows (only needed once per node):
sh-4.4# podman login registry.redhat.io Authenticating with existing credentials... Existing credentials are invalid, please enter valid username and password Username: rhn-<username> Password: ****** Login Succeeded! sh-4.4# sh-4.4# podman pull registry.redhat.io/rhel8/support-tools Trying to pull registry.redhat.io/rhel8/support-tools...Getting image source signatures Copying blob e61d8721e62e: 0 B / 67.75 MiB [-----------------------------------] Copying blob e61d8721e62e: 8.71 MiB / 67.75 MiB [===>--------------------------] Copying blob e61d8721e62e: 67.65 MiB / 67.75 MiB [=============================] Copying blob e61d8721e62e: 67.75 MiB / 67.75 MiB [=========================] 20s Copying blob c585fd5093c6: 1.47 KiB / 1.47 KiB [===========================] 20s Copying blob 77392c39ffcb: 8.67 MiB / 8.67 MiB [===========================] 20s Copying config 23a6cff4874d: 4.36 KiB / 4.36 KiB [==========================] 0s Writing manifest to image destination Storing signatures 23a6cff4874d03f84c7a787557b693afd58a1fb1f1123d5c9d254f785771c8fa
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments