How to generate a sos report in Red Hat Enterprise Linux CoreOS in OpenShift 4 with SSH access to nodes?

Solution Verified - Updated -

Environment

  • Red Hat OpenShift Container Platform (RHOCP)
    • 4
  • Red Hat Enterprise Linux CoreOS (RHCOS)
  • sosreport

Issue

  • How to generate sos report in Red Hat Enterprise Linux CoreOS in OCP 4 if oc debug node does not work?
  • How to generate sos report for Red Hat OpenShift 4 nodes?
  • Generating sos report using rhel7/support-tools image fails with a traceback.

    [root@ip-1-1-1-1 ~]# podman run -it registry.access.redhat.com/rhel7/support-tools /usr/bin/bash
    
    bash-4.2# sosreport 
    Traceback (most recent call last):
      File "/usr/sbin/sosreport", line 19, in <module>
        main(sys.argv[1:])
      File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1498, in main
        sos = SoSReport(args)
      File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 360, in __init__
        self.policy = sos.policies.load(sysroot=self.opts.sysroot)
      File "/usr/lib/python2.7/site-packages/sos/policies/__init__.py", line 44, in load
        cache['policy'] = policy(sysroot=sysroot)
      File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 258, in __init__
        super(RHELPolicy, self).__init__(sysroot=sysroot)
      File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 58, in __init__
        sysroot = self._container_init()
      File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 153, in _container_init
        host_tmp_dir = os.path.abspath(self._host_sysroot + self._tmp_dir)
    TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
    

Resolution

By design, OpenShift 4 nodes are immutable and rely on Operators to apply cluster changes. In turn, this means that accessing the underlying nodes directly by SSH is not the recommended procedure. Additionally, the nodes will be tainted as accessed.

Note: Due to the above, whenever possible, generate a sos report without using SSH, by spawning a debug pod directly from the oc command line. See How to generate sosreport within nodes without SSH in OCP 4 for further information.

Generating a sosreport with SSH access

Only if it is not possible to generate a sos report without SSH, connect to the OpenShift 4 node where a sos report shall be generated via SSH and become root:

$ ssh core@[NODE] # ssh with core user to the NODE using ssh key specified in install-config.yaml
[core@node ~]$ sudo -i

Note: in disconnected environments, it is needed to have the registry.redhat.io/rhel9/support-tools mirrored. If the image is already available for the nodes, create a /root/.toolboxrc file within the node as follows before running toolbox (change the REGISTRY var with the URL of the registry, and the IMAGE name in the custom registry):

[root@node ~]# vi /root/.toolboxrc
REGISTRY=[custom-private-registry.example.com:5000]
IMAGE=rhel9/support-tools

Run the toolbox command:

[root@node ~]# toolbox 
Spawning a container 'toolbox-root' with image 'registry.redhat.io/rhel9/support-tools'
Detected RUN label in the container image. Using that as the default...
Command: /proc/self/exe run -it --name toolbox-root --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=toolbox-root -e IMAGE=registry.redhat.io/rhel9/support-tools:latest -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host registry.redhat.io/rhel9/support-tools:latest

Execute sos report command (remove --all-logs parameter if the generated sosreport is too big):

[root@node ~]# sos report -e openshift -k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on --all-logs

Note: if any of the plugins times out, or not all the information is collected, it could be needed to add the paramenter --plugin-timeout=600 to increase the plugin timeout.

This will generate the sosreport in /host/var/tmp directory on the container (which maps to /var/tmp/ on the host). Refer to How to provide an sosreport from a RHEL CoreOS OpenShift 4 node for attaching the generated sosreport to a Support Case.

Once the sosreport was created, run exit to exit from the container's bash session to the node, and once again to exit from the node:

[root@node ~]# exit
[root@node ~]# exit
$ 

Root Cause

The toolbox command runs podman container runlabel run registry.redhat.io/rhel9/support-tools, which is the replacement for atomic run registry.redhat.io/rhel7/support-tools from RHEL Atomic Host.

Diagnostic Steps

If toolbox does not start the debug container as expected, check for a user-created $HOME/.toolboxrc file that could be overriding the default values of the REGISTRY, IMAGE, or TOOLBOX_NAME options. In disconnected environments, it will be needed to create that file to refer to the mirrored image.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments