How to generate a sos report in Red Hat Enterprise Linux CoreOS in OpenShift 4 with SSH access to nodes?
Environment
- Red Hat OpenShift Container Platform (RHOCP)
- 4
- Red Hat Enterprise Linux CoreOS (RHCOS)
- sos
Issue
- How to generate
sos reportin Red Hat Enterprise Linux CoreOS in OCP 4 ifoc debug nodedoes not work? - How to generate
sos reportfor Red Hat OpenShift 4 nodes? -
Generating
sos reportusingrhel9/support-toolsimage fails with a traceback.[root@ip-1-1-1-1 ~]# podman run -it registry.redhat.io/rhel9/support-tools /usr/bin/bash bash-4.2# sosreport Traceback (most recent call last): File "/usr/sbin/sosreport", line 19, in <module> main(sys.argv[1:]) File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1498, in main sos = SoSReport(args) File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 360, in __init__ self.policy = sos.policies.load(sysroot=self.opts.sysroot) File "/usr/lib/python2.7/site-packages/sos/policies/__init__.py", line 44, in load cache['policy'] = policy(sysroot=sysroot) File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 258, in __init__ super(RHELPolicy, self).__init__(sysroot=sysroot) File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 58, in __init__ sysroot = self._container_init() File "/usr/lib/python2.7/site-packages/sos/policies/redhat.py", line 153, in _container_init host_tmp_dir = os.path.abspath(self._host_sysroot + self._tmp_dir) TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Resolution
By design, OpenShift 4 nodes are immutable and rely on Operators to apply cluster changes. In turn, this means that accessing the underlying nodes directly by SSH is not the recommended procedure. Additionally, the nodes will be tainted as accessed.
Note: Due to the above, whenever possible, generate a sos report without using SSH, by spawning a debug pod directly from the
occommand line. See How to generate sos report within nodes without SSH in OCP 4 for further information.
Generating a sos report with SSH access
If it is not possible to generate a sos report without SSH, connect to the OpenShift 4 node where a sos report shall be generated via SSH and become root:
$ ssh core@[NODE] # ssh with core user to the NODE using ssh key specified in install-config.yaml
[core@node ~]$ sudo -i
Note: in disconnected environments, it is needed to have the
registry.redhat.io/rhel9/support-toolsmirrored. If the image is already available for the nodes, create a/root/.toolboxrcfile within the node as follows before runningtoolbox(change theREGISTRYvar with the URL of the registry, and theIMAGEname in the custom registry):[root@node ~]# vi /root/.toolboxrc REGISTRY=[custom-private-registry.example.com:5000] IMAGE=rhel9/support-tools
If a cluster environment utilizes a proxy, ensure that the proxy settings are imported as outlined in the article Cannot use toolbox in proxy environment.
Run the toolbox command:
[root@node ~]# toolbox
Spawning a container 'toolbox-root' with image 'registry.redhat.io/rhel9/support-tools'
Detected RUN label in the container image. Using that as the default...
Command: /proc/self/exe run -it --name toolbox-root --privileged --ipc=host --net=host --pid=host -e HOST=/host -e NAME=toolbox-root -e IMAGE=registry.redhat.io/rhel9/support-tools:latest -v /run:/run -v /var/log:/var/log -v /etc/machine-id:/etc/machine-id -v /etc/localtime:/etc/localtime -v /:/host registry.redhat.io/rhel9/support-tools:latest
Execute sos report command (remove --all-logs parameter if the generated sos report is too big):
[root@node ~]# sos report -e openshift -e openshift_ovn -e openvswitch -e podman -e crio -k crio.all=on -k crio.logs=on -k podman.all=on -k podman.logs=on -k networking.ethtool-namespaces=off --all-logs --plugin-timeout=600
This will generate the sos report in /host/var/tmp directory on the container (which maps to /var/tmp/ on the host). Refer to How to provide an sos report from a RHEL CoreOS OpenShift 4 node for attaching the generated sos report to a Support Case.
Once the sos report was created, run exit to exit from the container's bash session to the node, and once again to exit from the node:
[root@node ~]# exit
[root@node ~]# exit
$
Root Cause
The toolbox command runs podman container runlabel run registry.redhat.io/rhel9/support-tools, which is the replacement for atomic run registry.redhat.io/rhel7/support-tools from RHEL Atomic Host.
Diagnostic Steps
If toolbox does not start the debug container as expected, check for a user-created $HOME/.toolboxrc file that could be overriding the default values of the REGISTRY, IMAGE, or TOOLBOX_NAME options. In disconnected environments, it will be needed to create that file to refer to the mirrored image.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.
Comments