How to generate a sosreport in Red Hat Enterprise Linux CoreOS 4.x with SSH access to master nodes?

Solution Verified - Updated -

Environment

  • Red Hat Enterprise OpenShift Container Platform
    • 4.x
  • Red Hat Enterprise Linux CoreOS (RHCOS)

Issue

  • How to generate sosreport in Red Hat Enterprise Linux CoreOS 4.x?
  • How to generate sosreport for Red Hat OpenShift 4.x master nodes?
  • Generating sosreport using rhel7/support-tools image fails with a traceback.
[root@ip-1-1-1-1 ~]# podman run -it registry.access.redhat.com/rhel7/support-tools /usr/bin/bash

bash-4.2# sosreport 
Traceback (most recent call last):
  File "/usr/sbin/sosreport", line 19, in <module>
    main(sys.argv[1:])
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1498, in main
    sos = SoSReport(args)
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 360, in __init__
    self.policy = sos.policies.load(sysroot=self.opts.sysroot)
  File "/usr/lib/python2.7/site-packages/sos/policies/__init__.py", line 44, in load
    cache['policy'] = policy(sysroot=sysroot)
  File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 258, in __init__
    super(RHELPolicy, self).__init__(sysroot=sysroot)
  File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 58, in __init__
    sysroot = self._container_init()
  File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 153, in _container_init
    host_tmp_dir = os.path.abspath(self._host_sysroot + self._tmp_dir)
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Resolution

By design, OpenShift 4.x clusters are immutable and rely on Operators to apply cluster changes. In turn, this means that accessing the underlying nodes directly by SSH is not the recommended procedure. Additionally, the nodes will be tainted as accessed. Therefore, whenever possible, generate a sosreport without using SSH, by spawning a debug pod directly from the oc command line. See https://access.redhat.com/solutions/4387261 for further information.

Connect to the OpenShift Container Platform 4.x node where a sosreport shall be generated via SSH.

Become root. Then, as root, run the toolbox command:

$ ssh core@NODE # ssh with core user using ssh key specified in install-config.yaml
[core@node ~]$ sudo -i
[root@node ~]# toolbox 
Spawning a container 'toolbox-root' with image 'registry.redhat.io/rhel8/support-tools'
Detected RUN label in the container image. Using that as the default...
Command: /proc/self/exe run -it --name toolbox-root --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=toolbox-root -e IMAGE=registry.redhat.io/rhel8/support-tools:latest -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host registry.redhat.io/rhel8/support-tools:latest

The CLI is now attached to a new bash session within the container and one can execute sosreport:

[root@node ~]# sosreport -k crio.all=on -k crio.logs=on

This will generate the sosreport in /host/var/tmp directory on the container which maps to /var/tmp/ on the host.

Once the sosreport was created, run exit to exit from the container:

[root@node ~]# exit
[root@node ~]# 

Root Cause

toolbox runs podman container runlabel run registry.redhat.io/rhel8/support-tools, which is the replacement for atomic run registry.redhat.io/rhel7/support-tools from RHEL Atomic Host

Diagnostic Steps

  • If toolbox does not start the debug container as expected, check for a user-created $HOME/.toolboxrc that could be overriding the default values of the REGISTRY, IMAGE, or TOOLBOX_NAME options.

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

4 Comments

Under Resolution, I would add a clarification to the second step. Otherwise it's not clear where to run the "toolbox" command, as OCP 4 nodes are usually not accessed through SSH.

"Connect to the OpenShift Container Platform 4.x node on which you want to generate the sosreport via SSH. Use toolbox to properly start the container."

How to generate SOSREPORT within OpenShift4 nodes without SSH into them https://access.redhat.com/solutions/4387261

EDIT: Correct link to the external KB

You pasted the same link as this KCS. Can you share the correct link? :)

Edited the KCS