How to generate a sosreport within nodes without SSH in OCP 4
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
Issue
-
What is the recommended way for generating a sosreport in Red Hat OpenShift Container Platform?
-
It may not be possible to connect to OpenShift 4 nodes via SSH from outside the cluster by default but
sosreport
(or other machine binaries) may need to be run for troubleshooting purposes.
Resolution
NOTE: This solution relies on command
oc debug node/<node_name>
. Under specific circumstances, this command may fail, for example, ifkubelet
is not properly running on the targetnode
, in that case, consider other available options within section Other ways to generate a sosreport in OpenShift 4.x. For disconnected environments you can also check the section Generating a sosreport with "oc debug node" in disconnected environments.
By design, OpenShift 4.x clusters are immutable and rely on ClusterOperators to apply the changes. In turn, this means that accessing the underlying nodes directly by SSH is not the recommended procedure. Additionally, the nodes will be tainted as accessed.
Generating a sosreport with oc "debug node"
The following example shows how to debug node ip-10-0-132-143.eu-west-3.compute.internal.
- First, display the list of nodes in the cluster:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-131-87.eu-west-3.compute.internal Ready master 119d v1.14.6+8e46c0036
ip-10-0-132-143.eu-west-3.compute.internal Ready worker 119d v1.14.6+8e46c0036
ip-10-0-145-113.eu-west-3.compute.internal Ready master 119d v1.14.6+8e46c0036
ip-10-0-147-108.eu-west-3.compute.internal Ready worker 119d v1.14.6+8e46c0036
ip-10-0-161-51.eu-west-3.compute.internal Ready master 119d v1.14.6+8e46c0036
ip-10-0-163-177.eu-west-3.compute.internal Ready worker 119d v1.14.6+8e46c0036
- Then, create a debug session with
oc debug node/<node name>
, in this caseoc debug node/ip-10-0-132-143.eu-west-3.compute.internal
. The debug session will spawn a pod using thetools
image from the release (which doesn't containsos
):
$ oc debug node/ip-10-0-132-143.eu-west-3.compute.internal
Starting pod/ip-10-0-132-143eu-west-3computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.132.143
If you don't see a command prompt, try pressing enter.
sh-4.4# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.7 (Maipo)
sh-4.4#
- Once in the debug session, one can use
chroot
to change the apparent root directory to the one of the underlying host:
sh-4.4# chroot /host bash
[root@ip-10-0-132-143 /]# cat /etc/redhat-release
Red Hat Enterprise Linux CoreOS release 4.12
[root@ip-10-0-132-143 /]#
- Apply any proxy variables to your current session, if applicable.
- Now, run
toolbox
to run a special container with all necessary binaries:
[root@ip-10-0-132-143 /]# toolbox
Trying to pull registry.redhat.io/rhel8/support-tools...Getting image source signatures
Copying blob fd8daf2668d1 done
Copying blob 1457434f891b done
Copying blob cb3c77f9bdd8 done
Copying config 517597590f done
Writing manifest to image destination
Storing signatures
517597590ff4236b0e5e3efce75d88b2b238c19a58903f59a018fc4a40cd6cce
Spawning a container 'toolbox-' with image 'registry.redhat.io/rhel8/support-tools'
Detected RUN label in the container image. Using that as the default...
command: podman run -it --name toolbox- --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=toolbox- -e IMAGE=registry.redhat.io/rhel8/support-tools:latest -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host registry.redhat.io/rhel8/support-tools:latest
[root@ip-10-0-132-143 /]#
NOTE: If running toolbox yields the message
Container 'toolbox-' already exists. Trying to start...
, it is strongly recommended to remove the running toolbox container withpodman rm 'toolbox-'
. This will ensure that a new instance of the toolbox container is spawned which in turn will avoid issues withsosreport
plugins.
- Again, apply any proxy variables to your current session, if applicable, because this is a different shell session.
- Now, proceed with
sosreport
:
[root@ip-10-0-132-143 /]# sos report -k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on
sosreport (version 4.5.1)
This command will collect diagnostic and configuration information from
this Red Hat CoreOS system.
An archive containing the collected information will be generated in
/host/var/tmp/sos.idipawos and may be provided to a Red Hat support
representative.
Any information provided to Red Hat will be treated in accordance with
the published support policies at:
Distribution Website : https://www.redhat.com/
Commercial Support : https://www.access.redhat.com/
The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.
No changes will be made to system configuration.
Press ENTER to continue, or CTRL-C to quit.
Optionally, please enter the case id that you are generating this report for []: 01234567
Setting up archive ...
Setting up plugins ...
[plugin:auditd] Could not open conf file /etc/audit/auditd.conf: [Errno 2] No such file or directory: '/etc/audit/auditd.conf'
[plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec. Use '--allow-system-changes' to enable collection.
[plugin:networking] skipped command 'ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag. Use '--allow-system-changes' to enable collection.
[plugin:networking] skipped command 'ip netns exec 3c2c2bda-f52b-40d3-80cb-d5002012e290 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 7b2e03a6-879f-41d1-874d-825d6da8ecfc ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 964871f6-3846-4cdf-9f2c-5cf6844d2e45 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec df7187dc-d5c8-41a5-b469-6e2cba42b845 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 4881ba8e-44ba-4bad-bd97-98972aeee652 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 3a7d91d2-39c0-42db-81f4-1a0c6640ed49 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 85365ee7-1db5-4bc5-9915-f666097f087d ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 90d39fa5-4d6e-4006-8c36-9b92044df9e2 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 08130b47-abad-4c24-8380-efb2a85892cf ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec d9c8351d-968d-49fb-8d58-f7134440ee76 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec a88e480f-6f16-44cd-859e-6233476039f5 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 4b5bcbad-56ff-40a5-8d76-38bcfd063d06 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec bd90696e-a401-4298-aa4c-4679285e4c35 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec b0ee6146-aeb9-42db-8bcf-1c6a1f99b597 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 5116d524-ffa7-4747-85fe-b0c81a68a3db ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 7de627c9-ec01-4400-bbfd-d23c0cf78dd2 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 806cf74f-faed-439f-a43e-8a0598b01cd6 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec f9667767-a179-4bd1-889b-75362e7754d7 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 9d65a392-3065-4561-b0a4-8e26dfef0202 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 064ba5a9-c996-4b69-a574-eb5e90b05f45 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 0dc6f667-8ae7-4408-91e7-97998df9c101 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 3556cb21-8cdd-4e4d-810c-1db63a3ce29c ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec beb89f02-c59c-4079-ab09-b8be654bff17 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec ec5d58fe-07f7-4555-86d3-d77849b5fcd3 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec b3db4ea4-4f9d-428c-a71e-5652363b02c8 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 079dd4ed-eba4-49ba-91be-0ad79846b46b ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec 4f47bad9-fdbf-4243-81fb-9f8a826bc8dd ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
[plugin:networking] skipped command 'ip netns exec e4de8fd1-1d3c-44b2-95b8-8586b175623d ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.
caught exception in plugin method "system.setup()"
writing traceback to sos_logs/system-plugin-errors.txt
[plugin:systemd] skipped command 'resolvectl status': required services missing: systemd-resolved.
[plugin:systemd] skipped command 'resolvectl statistics': required services missing: systemd-resolved.
Running plugins. Please wait ...
Finishing plugins [Running: networking] mon]
Finished running plugins
Creating compressed archive...
Your sosreport has been generated and saved in:
/host/var/tmp/sosreport-worker-2-2023-05-11-jjpvgdf.tar.xz
Size 26.23MiB
Owner root
sha256 64dc2efa6f25c16f1bae9d596f291d899b875a16e0a945bc973387a3fb84382d
Please send this file to your support representative.
[root@ip-10-0-132-143 /]#
Generating a sosreport with "oc debug node" in disconnected environments
- For disconnected environments it's also possible to make the same operation but you'll need to have first the following images mirrored within your internal/private registry:
registry.redhat.io/rhel8/support-tools
- Launch the
oc debug node
command referencing the specific image andchroot
inside the RHCOS host:
$ oc debug node/ip-10-0-158-153.eu-west-3.compute.internal --image=private-registry.example.com:5000/rhel8/support-tools
Starting pod/ip-10-0-158-153eu-west-3computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.158.153
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host bash
[root@ip-10-0-158-153 /]#
- Once inside the RHCOS host, create the
/root/.toolboxrc
file as follows:
$ vi /root/.toolboxrc
REGISTRY=private-registry.example.com:5000
IMAGE=rhel8/support-tools
- Run the
toolbox
command:
[root@ip-10-0-158-153 /]# toolbox
.toolboxrc file detected, overriding defaults...
Spawning a container 'toolbox-' with image 'private-registry.example.com:5000/rhel8/support-tools'
Detected RUN label in the container image. Using that as the default...
command: podman run -it --name toolbox- --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=toolbox- -e IMAGE=private-registry.example.com:5000/rhel8/support-tools:latest -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host private-registry.example.com:5000/rhel8/support-tools:latest
[root@ip-10-0-158-153 /]#
- Generate the
sosreport
:
[root@ip-10-0-158-153 /]# sos report -k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on
sosreport (version 4.5.1)
This command will collect diagnostic and configuration information from
this Red Hat CoreOS system.
An archive containing the collected information will be generated in
/host/var/tmp/sos.idipawos and may be provided to a Red Hat support
representative.
Any information provided to Red Hat will be treated in accordance with
the published support policies at:
Distribution Website : https://www.redhat.com/
Commercial Support : https://www.access.redhat.com/
The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.
No changes will be made to system configuration.
Press ENTER to continue, or CTRL-C to quit.
Optionally, please enter the case id that you are generating this report for []: 01234567
Setting up archive ...
Setting up plugins ...
[...]
What options are available to copy/share the generated sosreport?
Several alternatives exist in order to attach the generated sosreport to a Red Hat support case.
- It is possible to attach the sosreport directly to a case from within the toolbox container by using
redhat-support-tool
:
[root@ip-10-0-132-143 /]# redhat-support-tool addattachment -c 111111 /host/var/tmp/sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz
Please enter your RHN user ID: rhn-username
Save the user ID in /root/.redhat-support-tool/redhat-support-tool.conf (y/n): n
Please enter the password for rhn-username:
Save the password for rhn-username in /root/.redhat-support-tool/redhat-support-tool.conf (y/n): n
Uploading sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz to the case ...
- One can use
scp
to copy the file out of the node. The sosreport archive can be found on the host in the/var/tmp
folder after exiting from thetoolbox
container:
[root@ip-10-0-132-143 /]# ls /var/tmp/sosreport*
/var/tmp/sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz
- It is possible to copy the sosreport to the local host by using cat and output redirection. This is particularly useful for disconnected environments:
$ oc debug node/ip-10-0-132-143.eu-west-3.compute.internal -- cat /host/var/tmp/sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz > /tmp/sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz
Starting pod/ip-10-0-132-143.eu-west-3.compute.internal-debug ...
To use host binaries, run `chroot /host`
Removing debug pod ...
[pamoedo@localhost ~] $ du -h /tmp/sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz
26M /tmp/sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz
Other ways to generate a sosreport in OpenShift 4.x
-
It is possible to log into the node directly via SSH and take a
sosreport
. Check https://access.redhat.com/solutions/3820762 for the instructions. -
Launch
oc debug node/<node_name>
against a different workingnode
, create a file with the same private key used for the installation and SSH into the failingnode
after that, for example:
$ oc debug node/ip-10-0-130-181.eu-west-3.compute.internal
Starting pod/ip-10-0-130-181eu-west-3computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.130.181
If you don't see a command prompt, try pressing enter.
sh-4.4# vim key
sh-4.4# chmod 400 key
sh-4.4# ssh -i key -l core ip-10-0-164-119.eu-west-3.compute.internal
Red Hat Enterprise Linux CoreOS 43.81.202003191953.0
Part of OpenShift 4.3, RHCOS is a Kubernetes native operating system
managed by the Machine Config Operator (`clusteroperator/machine-config`).
WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
https://docs.openshift.com/container-platform/4.3/architecture/architecture-rhcos.html
---
[core@ip-10-0-164-119 ~]$
Finally, if all suggestions fail, you can use this simpler script version of sosreport: Sosreport fails. What data should I provide in its place?.
Diagnostic Steps
- In order to have
sosreport
available, we need to usetoolbox
container but in some early versions it was failing to download the necessaryregistry.redhat.io/rhel8/support-tools
image (even if you manually provided the registry.redhat.io credentials):
sh-4.4# toolbox
Trying to pull registry.redhat.io/rhel8/support-tools...Failed
error pulling image "registry.redhat.io/rhel8/support-tools": unable to pull registry.redhat.io/rhel8/support-tools: unable to pull image: Error determining manifest MIME type for docker://registry.redhat.io/rhel8/support-tools:latest: unable to retrieve auth token: invalid username/password
Would you like to authenticate to registry: 'registry.redhat.io' and try again? [y/N] y
Authenticating with existing credentials...
Existing credentials are invalid, please enter valid username and password
Username: rhn-<username>
Password: ******
Login Succeeded!
Trying to pull registry.redhat.io/rhel8/support-tools...Failed
error pulling image "registry.redhat.io/rhel8/support-tools": unable to pull registry.redhat.io/rhel8/support-tools: unable to pull image: Error determining manifest MIME type for docker://registry.redhat.io/rhel8/support-tools:latest: unable to retrieve auth token: invalid username/password
- In order to solve this, you can simply download the image first within the node as follows (only needed once per node):
sh-4.4# podman login registry.redhat.io
Authenticating with existing credentials...
Existing credentials are invalid, please enter valid username and password
Username: rhn-<username>
Password: ******
Login Succeeded!
sh-4.4# podman pull registry.redhat.io/rhel8/support-tools
Trying to pull registry.redhat.io/rhel8/support-tools...Getting image source signatures
Copying blob e61d8721e62e: 0 B / 67.75 MiB [-----------------------------------]
Copying blob e61d8721e62e: 8.71 MiB / 67.75 MiB [===>--------------------------]
Copying blob e61d8721e62e: 67.65 MiB / 67.75 MiB [=============================]
Copying blob e61d8721e62e: 67.75 MiB / 67.75 MiB [=========================] 20s
Copying blob c585fd5093c6: 1.47 KiB / 1.47 KiB [===========================] 20s
Copying blob 77392c39ffcb: 8.67 MiB / 8.67 MiB [===========================] 20s
Copying config 23a6cff4874d: 4.36 KiB / 4.36 KiB [==========================] 0s
Writing manifest to image destination
Storing signatures
23a6cff4874d03f84c7a787557b693afd58a1fb1f1123d5c9d254f785771c8fa
- How to check if your
nodes
where externally accessed byssh
:
$ oc get nodes -o jsonpath='{range .items[*]}{.metadata.name}{" - "}{.metadata.annotations.machineconfiguration\.openshift\.io/ssh}{"\n"}{end}'
ip-10-0-130-98.eu-west-3.compute.internal - accessed
ip-10-0-136-221.eu-west-3.compute.internal - accessed
ip-10-0-151-69.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
12 Comments
if you can add in disconnected environment pass the image as part of the oc debug node command as in
this means pulling, tagging and pushing the support-tools images into the private registry.
and when you are in chrooted environment you need to edit a file and add your registry and image in there for toolbox command to work
/root/.toolboxrcThanks Walid, it's a good suggestion, I will create a subsection for disconnected environments.
One thing, please note that when doing
oc debug node
, the default image used isregistry.redhat.io/rhel7/support-tools
but when callingtoolbox
, the default one isregistry.redhat.io/rhel8/support-tools
instead because the underlying RHCOS is based on RHEL8.Best Regards.
wow, missed that subtle difference, thank you for pointing it out.
Done, please check here, thanks.
perfect Pedro, thank you very much
How to import toolbox disconnected environments?
Review section Generating a sosreport with "oc debug node" in disconnected environments
There's no information about how to import and avaliable toolbox to internal/private registry
On OCP 4.8.x Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command, E.G. 'sos report'.
Redirecting to 'sos report -k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on'
For disconnected environments it's also possible to make the same operation but you'll need to have first the following images mirrored within your internal/private registry
There's no link to how to perform this action
Instructions are out of date and require tweaking to work properly.
bash: redhat-support-tool: command not found