How to generate a sos report in Red Hat Enterprise Linux CoreOS in OpenShift 4 with SSH access to nodes?
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Red Hat Enterprise Linux CoreOS (RHCOS)
- sosreport
Issue
- How to generate
sos report
in Red Hat Enterprise Linux CoreOS in OCP 4 ifoc debug node
does not work? - How to generate
sos report
for Red Hat OpenShift 4 nodes? -
Generating
sos report
usingrhel7/support-tools
image fails with a traceback.[root@ip-1-1-1-1 ~]# podman run -it registry.access.redhat.com/rhel7/support-tools /usr/bin/bash bash-4.2# sosreport Traceback (most recent call last): File "/usr/sbin/sosreport", line 19, in <module> main(sys.argv[1:]) File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1498, in main sos = SoSReport(args) File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 360, in __init__ self.policy = sos.policies.load(sysroot=self.opts.sysroot) File "/usr/lib/python2.7/site-packages/sos/policies/__init__.py", line 44, in load cache['policy'] = policy(sysroot=sysroot) File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 258, in __init__ super(RHELPolicy, self).__init__(sysroot=sysroot) File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 58, in __init__ sysroot = self._container_init() File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 153, in _container_init host_tmp_dir = os.path.abspath(self._host_sysroot + self._tmp_dir) TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Resolution
By design, OpenShift 4 nodes are immutable and rely on Operators to apply cluster changes. In turn, this means that accessing the underlying nodes directly by SSH is not the recommended procedure. Additionally, the nodes will be tainted as accessed.
Note: Due to the above, whenever possible, generate a sos report without using SSH, by spawning a debug pod directly from the
oc
command line. See How to generate sosreport within nodes without SSH in OCP 4 for further information.
Generating a sosreport with SSH access
Only if it is not possible to generate a sos report without SSH, connect to the OpenShift 4 node where a sos report shall be generated via SSH and become root
:
$ ssh core@[NODE] # ssh with core user to the NODE using ssh key specified in install-config.yaml
[core@node ~]$ sudo -i
Note: in disconnected environments, it is needed to have the
registry.redhat.io/rhel9/support-tools
mirrored. If the image is already available for the nodes, create a/root/.toolboxrc
file within the node as follows before runningtoolbox
(change theREGISTRY
var with the URL of the registry, and theIMAGE
name in the custom registry):[root@node ~]# vi /root/.toolboxrc REGISTRY=[custom-private-registry.example.com:5000] IMAGE=rhel9/support-tools
Run the toolbox
command:
[root@node ~]# toolbox
Spawning a container 'toolbox-root' with image 'registry.redhat.io/rhel9/support-tools'
Detected RUN label in the container image. Using that as the default...
Command: /proc/self/exe run -it --name toolbox-root --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=toolbox-root -e IMAGE=registry.redhat.io/rhel9/support-tools:latest -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host registry.redhat.io/rhel9/support-tools:latest
Execute sos report
command (remove --all-logs
parameter if the generated sosreport is too big):
[root@node ~]# sos report -e openshift -k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on --all-logs
Note: if any of the plugins times out, or not all the information is collected, it could be needed to add the paramenter
--plugin-timeout=600
to increase the plugin timeout.
This will generate the sosreport in /host/var/tmp
directory on the container (which maps to /var/tmp/
on the host). Refer to How to provide an sosreport from a RHEL CoreOS OpenShift 4 node for attaching the generated sosreport to a Support Case.
Once the sosreport was created, run exit
to exit from the container's bash session to the node, and once again to exit from the node:
[root@node ~]# exit
[root@node ~]# exit
$
Root Cause
The toolbox
command runs podman container runlabel run registry.redhat.io/rhel9/support-tools
, which is the replacement for atomic run registry.redhat.io/rhel7/support-tools
from RHEL Atomic Host.
Diagnostic Steps
If toolbox
does not start the debug container as expected, check for a user-created $HOME/.toolboxrc
file that could be overriding the default values of the REGISTRY
, IMAGE
, or TOOLBOX_NAME
options. In disconnected environments, it will be needed to create that file to refer to the mirrored image.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments