How to generate a sosreport within nodes without SSH in OCP 4

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4

Issue

  • What is the recommended way for generating a sosreport in Red Hat OpenShift Container Platform?

  • It may not be possible to connect to OpenShift 4 nodes via SSH from outside the cluster by default but sosreport (or other machine binaries) may need to be run for troubleshooting purposes.

Resolution

NOTE: This solution relies on command oc debug node/<node_name>. Under specific circumstances, this command may fail, for example, if kubelet is not properly running on the target node, in that case, consider other available options within section Other ways to generate a sosreport in OpenShift 4.x. For disconnected environments you can also check the section Generating a sosreport with "oc debug node" in disconnected environments.

By design, OpenShift 4.x clusters are immutable and rely on ClusterOperators to apply the changes. In turn, this means that accessing the underlying nodes directly by SSH is not the recommended procedure. Additionally, the nodes will be tainted as accessed.


Generating a sosreport with oc "debug node"

The following example shows how to debug node ip-10-0-132-143.eu-west-3.compute.internal.

  • First, display the list of nodes in the cluster:
$ oc get nodes
NAME                                         STATUS   ROLES    AGE    VERSION
ip-10-0-131-87.eu-west-3.compute.internal    Ready    master   119d   v1.14.6+8e46c0036
ip-10-0-132-143.eu-west-3.compute.internal   Ready    worker   119d   v1.14.6+8e46c0036
ip-10-0-145-113.eu-west-3.compute.internal   Ready    master   119d   v1.14.6+8e46c0036
ip-10-0-147-108.eu-west-3.compute.internal   Ready    worker   119d   v1.14.6+8e46c0036
ip-10-0-161-51.eu-west-3.compute.internal    Ready    master   119d   v1.14.6+8e46c0036
ip-10-0-163-177.eu-west-3.compute.internal   Ready    worker   119d   v1.14.6+8e46c0036
  • Then, create a debug session with oc debug node/<node name>, in this case oc debug node/ip-10-0-132-143.eu-west-3.compute.internal. The debug session will spawn a pod using the tools image from the release (which doesn't contain sos):
$ oc debug node/ip-10-0-132-143.eu-west-3.compute.internal
Starting pod/ip-10-0-132-143eu-west-3computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.132.143
If you don't see a command prompt, try pressing enter.
sh-4.4# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.7 (Maipo)
sh-4.4#
  • Once in the debug session, one can use chroot to change the apparent root directory to the one of the underlying host:
sh-4.4# chroot /host bash
[root@ip-10-0-132-143 /]#  cat /etc/redhat-release 
Red Hat Enterprise Linux CoreOS release 4.12
[root@ip-10-0-132-143 /]# 
  • Apply any proxy variables to your current session, if applicable.
  • Now, run toolbox to run a special container with all necessary binaries:
[root@ip-10-0-132-143 /]# toolbox
Trying to pull registry.redhat.io/rhel8/support-tools...Getting image source signatures
Copying blob fd8daf2668d1 done
Copying blob 1457434f891b done
Copying blob cb3c77f9bdd8 done
Copying config 517597590f done
Writing manifest to image destination
Storing signatures
517597590ff4236b0e5e3efce75d88b2b238c19a58903f59a018fc4a40cd6cce
Spawning a container 'toolbox-' with image 'registry.redhat.io/rhel8/support-tools'
Detected RUN label in the container image. Using that as the default...
command: podman run -it --name toolbox- --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=toolbox- -e IMAGE=registry.redhat.io/rhel8/support-tools:latest -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host registry.redhat.io/rhel8/support-tools:latest
[root@ip-10-0-132-143 /]#

NOTE: If running toolbox yields the message Container 'toolbox-' already exists. Trying to start..., it is strongly recommended to remove the running toolbox container with podman rm 'toolbox-'. This will ensure that a new instance of the toolbox container is spawned which in turn will avoid issues with sosreport plugins.

  • Again, apply any proxy variables to your current session, if applicable, because this is a different shell session.
  • Now, proceed with sosreport:
[root@ip-10-0-132-143 /]# sos report -k crio.all=on -k crio.logs=on  -k podman.all=on -k podman.logs=on

sosreport (version 4.5.1)

This command will collect diagnostic and configuration information from
this Red Hat CoreOS system.

An archive containing the collected information will be generated in
/host/var/tmp/sos.idipawos and may be provided to a Red Hat support
representative.

Any information provided to Red Hat will be treated in accordance with
the published support policies at:

        Distribution Website : https://www.redhat.com/
        Commercial Support   : https://www.access.redhat.com/

The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.

No changes will be made to system configuration.

Press ENTER to continue, or CTRL-C to quit.

Optionally, please enter the case id that you are generating this report for []: 01234567

 Setting up archive ...
 Setting up plugins ...
[plugin:auditd] Could not open conf file /etc/audit/auditd.conf: [Errno 2] No such file or directory: '/etc/audit/auditd.conf'
[plugin:networking] skipped command 'ip -s macsec show': required kmods missing: macsec.   Use '--allow-system-changes' to enable collection.
[plugin:networking] skipped command 'ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.   Use '--allow-system-changes' to enable collection.
[plugin:networking] skipped command 'ip netns exec 3c2c2bda-f52b-40d3-80cb-d5002012e290 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 7b2e03a6-879f-41d1-874d-825d6da8ecfc ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 964871f6-3846-4cdf-9f2c-5cf6844d2e45 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec df7187dc-d5c8-41a5-b469-6e2cba42b845 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 4881ba8e-44ba-4bad-bd97-98972aeee652 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 3a7d91d2-39c0-42db-81f4-1a0c6640ed49 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 85365ee7-1db5-4bc5-9915-f666097f087d ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 90d39fa5-4d6e-4006-8c36-9b92044df9e2 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 08130b47-abad-4c24-8380-efb2a85892cf ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec d9c8351d-968d-49fb-8d58-f7134440ee76 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec a88e480f-6f16-44cd-859e-6233476039f5 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 4b5bcbad-56ff-40a5-8d76-38bcfd063d06 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec bd90696e-a401-4298-aa4c-4679285e4c35 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec b0ee6146-aeb9-42db-8bcf-1c6a1f99b597 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 5116d524-ffa7-4747-85fe-b0c81a68a3db ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 7de627c9-ec01-4400-bbfd-d23c0cf78dd2 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 806cf74f-faed-439f-a43e-8a0598b01cd6 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec f9667767-a179-4bd1-889b-75362e7754d7 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 9d65a392-3065-4561-b0a4-8e26dfef0202 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 064ba5a9-c996-4b69-a574-eb5e90b05f45 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 0dc6f667-8ae7-4408-91e7-97998df9c101 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 3556cb21-8cdd-4e4d-810c-1db63a3ce29c ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec beb89f02-c59c-4079-ab09-b8be654bff17 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec ec5d58fe-07f7-4555-86d3-d77849b5fcd3 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec b3db4ea4-4f9d-428c-a71e-5652363b02c8 ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 079dd4ed-eba4-49ba-91be-0ad79846b46b ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec 4f47bad9-fdbf-4243-81fb-9f8a826bc8dd ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
[plugin:networking] skipped command 'ip netns exec e4de8fd1-1d3c-44b2-95b8-8586b175623d ss -peaonmi': required kmods missing: udp_diag, af_packet_diag, inet_diag, xsk_diag, netlink_diag, tcp_diag, unix_diag.  
caught exception in plugin method "system.setup()"
writing traceback to sos_logs/system-plugin-errors.txt
[plugin:systemd] skipped command 'resolvectl status': required services missing: systemd-resolved.  
[plugin:systemd] skipped command 'resolvectl statistics': required services missing: systemd-resolved.  
 Running plugins. Please wait ...

  Finishing plugins              [Running: networking]                                    mon]
  Finished running plugins                                                               
Creating compressed archive...

Your sosreport has been generated and saved in:
    /host/var/tmp/sosreport-worker-2-2023-05-11-jjpvgdf.tar.xz

 Size   26.23MiB
 Owner  root
 sha256 64dc2efa6f25c16f1bae9d596f291d899b875a16e0a945bc973387a3fb84382d

Please send this file to your support representative.

[root@ip-10-0-132-143 /]#

Generating a sosreport with "oc debug node" in disconnected environments

  • For disconnected environments it's also possible to make the same operation but you'll need to have first the following images mirrored within your internal/private registry:
registry.redhat.io/rhel8/support-tools
  • Launch the oc debug node command referencing the specific image and chroot inside the RHCOS host:
$ oc debug node/ip-10-0-158-153.eu-west-3.compute.internal --image=private-registry.example.com:5000/rhel8/support-tools
Starting pod/ip-10-0-158-153eu-west-3computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.158.153
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host bash
[root@ip-10-0-158-153 /]#
  • Once inside the RHCOS host, create the /root/.toolboxrc file as follows:
$ vi /root/.toolboxrc
REGISTRY=private-registry.example.com:5000
IMAGE=rhel8/support-tools
  • Run the toolbox command:
[root@ip-10-0-158-153 /]# toolbox
.toolboxrc file detected, overriding defaults...
Spawning a container 'toolbox-' with image 'private-registry.example.com:5000/rhel8/support-tools'
Detected RUN label in the container image. Using that as the default...
command: podman run -it --name toolbox- --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=toolbox- -e IMAGE=private-registry.example.com:5000/rhel8/support-tools:latest -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host private-registry.example.com:5000/rhel8/support-tools:latest
[root@ip-10-0-158-153 /]#
  • Generate the sosreport:
[root@ip-10-0-158-153 /]# sos report -k crio.all=on -k crio.logs=on  -k podman.all=on -k podman.logs=on

sosreport (version 4.5.1)

This command will collect diagnostic and configuration information from
this Red Hat CoreOS system.

An archive containing the collected information will be generated in
/host/var/tmp/sos.idipawos and may be provided to a Red Hat support
representative.

Any information provided to Red Hat will be treated in accordance with
the published support policies at:

        Distribution Website : https://www.redhat.com/
        Commercial Support   : https://www.access.redhat.com/

The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before being
passed to any third party.

No changes will be made to system configuration.

Press ENTER to continue, or CTRL-C to quit.

Optionally, please enter the case id that you are generating this report for []: 01234567

 Setting up archive ...
 Setting up plugins ...
[...]

What options are available to copy/share the generated sosreport?

Several alternatives exist in order to attach the generated sosreport to a Red Hat support case.

  • It is possible to attach the sosreport directly to a case from within the toolbox container by using redhat-support-tool:
[root@ip-10-0-132-143 /]# redhat-support-tool addattachment -c 111111 /host/var/tmp/sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz
Please enter your RHN user ID: rhn-username
Save the user ID in /root/.redhat-support-tool/redhat-support-tool.conf (y/n): n
Please enter the password for rhn-username: 
Save the password for rhn-username in /root/.redhat-support-tool/redhat-support-tool.conf (y/n): n
Uploading sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz to the case ...
  • One can use scp to copy the file out of the node. The sosreport archive can be found on the host in the /var/tmp folder after exiting from the toolbox container:
[root@ip-10-0-132-143 /]# ls /var/tmp/sosreport*
/var/tmp/sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz
  • It is possible to copy the sosreport to the local host by using cat and output redirection. This is particularly useful for disconnected environments:
$ oc debug node/ip-10-0-132-143.eu-west-3.compute.internal -- cat /host/var/tmp/sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz > /tmp/sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz
Starting pod/ip-10-0-132-143.eu-west-3.compute.internal-debug ...
To use host binaries, run `chroot /host`

Removing debug pod ...
[pamoedo@localhost ~] $ du -h /tmp/sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz
26M /tmp/sosreport-ip-10-0-132-143-111111-2020-01-15-ilulmnh.tar.xz

Other ways to generate a sosreport in OpenShift 4.x

  1. It is possible to log into the node directly via SSH and take a sosreport. Check https://access.redhat.com/solutions/3820762 for the instructions.

  2. Launch oc debug node/<node_name> against a different working node, create a file with the same private key used for the installation and SSH into the failing node after that, for example:

$ oc debug node/ip-10-0-130-181.eu-west-3.compute.internal
Starting pod/ip-10-0-130-181eu-west-3computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.130.181
If you don't see a command prompt, try pressing enter.
sh-4.4# vim key
sh-4.4# chmod 400 key
sh-4.4# ssh -i key -l core ip-10-0-164-119.eu-west-3.compute.internal
Red Hat Enterprise Linux CoreOS 43.81.202003191953.0
  Part of OpenShift 4.3, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).
WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.3/architecture/architecture-rhcos.html
---
[core@ip-10-0-164-119 ~]$

Finally, if all suggestions fail, you can use this simpler script version of sosreport: Sosreport fails. What data should I provide in its place?.

Diagnostic Steps

  • In order to have sosreport available, we need to use toolbox container but in some early versions it was failing to download the necessary registry.redhat.io/rhel8/support-tools image (even if you manually provided the registry.redhat.io credentials):
sh-4.4# toolbox
Trying to pull registry.redhat.io/rhel8/support-tools...Failed
error pulling image "registry.redhat.io/rhel8/support-tools": unable to pull registry.redhat.io/rhel8/support-tools: unable to pull image: Error determining manifest MIME type for docker://registry.redhat.io/rhel8/support-tools:latest: unable to retrieve auth token: invalid username/password
Would you like to authenticate to registry: 'registry.redhat.io' and try again? [y/N] y
Authenticating with existing credentials...
Existing credentials are invalid, please enter valid username and password
Username: rhn-<username>
Password: ******
Login Succeeded!
Trying to pull registry.redhat.io/rhel8/support-tools...Failed
error pulling image "registry.redhat.io/rhel8/support-tools": unable to pull registry.redhat.io/rhel8/support-tools: unable to pull image: Error determining manifest MIME type for docker://registry.redhat.io/rhel8/support-tools:latest: unable to retrieve auth token: invalid username/password
  • In order to solve this, you can simply download the image first within the node as follows (only needed once per node):
sh-4.4# podman login registry.redhat.io
Authenticating with existing credentials...
Existing credentials are invalid, please enter valid username and password
Username: rhn-<username>
Password: ******
Login Succeeded!

sh-4.4# podman pull registry.redhat.io/rhel8/support-tools
Trying to pull registry.redhat.io/rhel8/support-tools...Getting image source signatures
Copying blob e61d8721e62e: 0 B / 67.75 MiB [-----------------------------------]
Copying blob e61d8721e62e: 8.71 MiB / 67.75 MiB [===>--------------------------]
Copying blob e61d8721e62e: 67.65 MiB / 67.75 MiB [=============================]
Copying blob e61d8721e62e: 67.75 MiB / 67.75 MiB [=========================] 20s
Copying blob c585fd5093c6: 1.47 KiB / 1.47 KiB [===========================] 20s
Copying blob 77392c39ffcb: 8.67 MiB / 8.67 MiB [===========================] 20s
Copying config 23a6cff4874d: 4.36 KiB / 4.36 KiB [==========================] 0s
Writing manifest to image destination
Storing signatures
23a6cff4874d03f84c7a787557b693afd58a1fb1f1123d5c9d254f785771c8fa

  • How to check if your nodes where externally accessed by ssh:
$ oc get nodes -o jsonpath='{range .items[*]}{.metadata.name}{" - "}{.metadata.annotations.machineconfiguration\.openshift\.io/ssh}{"\n"}{end}'
ip-10-0-130-98.eu-west-3.compute.internal - accessed
ip-10-0-136-221.eu-west-3.compute.internal - accessed
ip-10-0-151-69.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments