How to generate a sos report in Red Hat Enterprise Linux CoreOS 4.x with SSH access to nodes?
Environment
- Red Hat OpenShift Container Platform (RHOCP, OCP)
- 4.x
- Red Hat Enterprise Linux CoreOS (RHCOS)
Issue
- How to generate
sos report
in Red Hat Enterprise Linux CoreOS 4.x? - How to generate
sos report
for Red Hat OpenShift 4.x nodes? -
Generating
sos report
usingrhel7/support-tools
image fails with a traceback.[root@ip-1-1-1-1 ~]# podman run -it registry.access.redhat.com/rhel7/support-tools /usr/bin/bash bash-4.2# sosreport Traceback (most recent call last): File "/usr/sbin/sosreport", line 19, in <module> main(sys.argv[1:]) File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1498, in main sos = SoSReport(args) File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 360, in __init__ self.policy = sos.policies.load(sysroot=self.opts.sysroot) File "/usr/lib/python2.7/site-packages/sos/policies/__init__.py", line 44, in load cache['policy'] = policy(sysroot=sysroot) File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 258, in __init__ super(RHELPolicy, self).__init__(sysroot=sysroot) File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 58, in __init__ sysroot = self._container_init() File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 153, in _container_init host_tmp_dir = os.path.abspath(self._host_sysroot + self._tmp_dir) TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Resolution
Important note: By design, OpenShift 4.x clusters are immutable and rely on Operators to apply cluster changes. In turn, this means that accessing the underlying nodes directly by SSH is not the recommended procedure. Additionally, the nodes will be tainted as accessed.
Therefore, whenever possible, generate a sos report without using SSH, by spawning a debug pod directly from theoc
command line. See How to generate SOSREPORT within OpenShift4 nodes without SSH for further information.
Only if it's not possible to generate a sos report without SSH, connect to the OpenShift Container Platform 4.x node where a sos report shall be generated via SSH. After that, become root
and run the toolbox
command:
$ ssh core@NODE # ssh with core user using ssh key specified in install-config.yaml
[core@node ~]$ sudo -i
[root@node ~]# toolbox
Spawning a container 'toolbox-root' with image 'registry.redhat.io/rhel8/support-tools'
Detected RUN label in the container image. Using that as the default...
Command: /proc/self/exe run -it --name toolbox-root --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=toolbox-root -e IMAGE=registry.redhat.io/rhel8/support-tools:latest -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host registry.redhat.io/rhel8/support-tools:latest
Note: in disconnected environments, it's needed to have the
registry.redhat.io/rhel8/support-tools
mirrored. If the image is already available, create a/root/.toolboxrc
file within the node as follows before runningtoolbox
:
$ vi /root/.toolboxrc
REGISTRY=private-registry.example.com:5000
IMAGE=rhel8/support-tools
The CLI is now attached to a new bash session within the container and one can execute sos report
:
[root@node ~]# sos report -k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on
This will generate the sos report in /host/var/tmp
directory on the container which maps to /var/tmp/
on the host.
Once the sos report was created, run exit
to exit from the container:
[root@node ~]# exit
[root@node ~]#
Root Cause
The toolbox
command runs podman container runlabel run registry.redhat.io/rhel8/support-tools
, which is the replacement for atomic run registry.redhat.io/rhel7/support-tools
from RHEL Atomic Host.
Diagnostic Steps
If toolbox
does not start the debug container as expected, check for a user-created $HOME/.toolboxrc
file that could be overriding the default values of the REGISTRY
, IMAGE
, or TOOLBOX_NAME
options.
In disconnected environments, it will be needed to create that file to refer to the mirrored image.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
10 Comments
Under Resolution, I would add a clarification to the second step. Otherwise it's not clear where to run the "toolbox" command, as OCP 4 nodes are usually not accessed through SSH.
"Connect to the OpenShift Container Platform 4.x node on which you want to generate the sosreport via SSH. Use toolbox to properly start the container."
How to generate SOSREPORT within OpenShift4 nodes without SSH into them https://access.redhat.com/solutions/4387261
EDIT: Correct link to the external KB
You pasted the same link as this KCS. Can you share the correct link? :)
Edited the KCS
Unfortunately this doesn't work for environments behind a proxy or disconnected environments. Trying the other method mentioned in the comments...
Hello
The disconnected env please follow the step
On bastion machine or can access internet machine $ podman pull registry.redhat.io/rhel8/support-tools
save the image to tarball file $ podman save -o tools.tar 50b63c2aff8c
send the image.tar file to coreos node $ scp tools.tar core@xxxx
load the image $ podman load -i tools.tar
5 change the image tag $ podman tag 50b63c2aff8c registry.redhat.io/rhel8/support-tools
i ran this.. the sosreport is in the container FS.. once you leave its gone. how do you access the report. I tried to copy to /tmp.. but that too is in the container.
The CLI is now attached to a new bash session within the container and one can execute sosreport:
I noticed that the 'toolbox' command leaves behind a container each time it's run:
This can be cleaned up with 'podman rm toolbox-'.
(To be fair, this is mentioned in the output of 'toolbox --help').
No description how to tranfer sosreport file to desktop. Did someone from RedHat noticed, that people are sometimes using documentation in hurry and under pressure?
Could you add about removing the previous "toolbox-" container by 'podman rm' to this KCS? I think some customers will only see this KCS and get an sosreport.