Sosreport fails during archive compression on RHEL

Solution Verified - Updated -

Environment

  • Red Hat Enterprise Linux (RHEL) 6
  • Red Hat Satellite Proxy 5.7.0
  • sos-3.2-54.el6

Issue

  • Sosreport fails after satellite plugin timed out

    Running 58/81: rpm...
    Running 59/81: sar...
    Running 60/81: satellite...
    [plugin:satellite] command 'spacewalk-debug --dir /var/tmp/sos.1234/sosreport-test-xxx/sos_commands/satellite/spacewalk-debug' timed out after 300s
    Running 61/81: scsi...
    ...
    Running 81/81: yum...
    
    Creating compressed archive...
    Traceback (most recent call last):
    File "/usr/sbin/sosreport", line 25, in <module>
    main(sys.argv[1:])
    File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 1520, in main
    sos.execute()
    File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 1499, in execute
    return self.final_work()
    File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 1411, in final_work
    checksum = self._create_checksum(archive, hash_name)
    File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 1349, in _create_checksum
    archive_fp = open(archive, 'rb')
    IOError: [Errno 2] No such file or directory: '/var/tmp/sos.1234/sosreport-test-xxx/sosreport-test-xxx.tar.xz'
    

Resolution

  • The timeouts observed in the satellite sos plug-in can be resolved by updating to satellite 5.8 RHEA-2017:1557.
  • Update sosreport to sos-3.2-63.el6 released with Advisory RHBA-2018:1920 or newer, including following enhancements:
    • sosreport now detects spacewalk-backend version and skip spacewalk-debug collection if older than the fixed one (spacewalk-backend-2.5.3-143), tracked by bug 1496288GA
    • sosreport increased timeout for spacewalk-debug, tracked by bug 1439943

Root Cause

The satellite spacewalk-debug command is attempting to archive a huge volume of data ( > 10G) while running spacewalk-debug, and is timeing out, causing sosreport to fail.

Bug 1439949 - [RFE] include postgres logs in spacewalk-debug and no longer include non-pertinent files

Diagnostic Steps

sosreport has detected that the archive was missing when attempting to calculate an its checksum (md5sum), but the issue occured prior to that while the arcive was being compressed.

/usr/lib/python2.6/site-packages/sos/sosreport.py
  50 # file system errors that should terminate a run
  51 fatal_fs_errors = (errno.ENOSPC, errno.EROFS)
...
1372         # package up and compress the results
1373         if not self.opts.build:
1374             old_umask = os.umask(0o077)
1375             if not self.opts.quiet:
1376                 print(_("Creating compressed archive..."))
1377             # compression could fail for a number of reasons
1378             try:
1379                 archive = self.archive.finalize(
1380                     self.opts.compression_type)
1381             except (OSError, IOError) as e:
1382                 if e.errno in fatal_fs_errors:  # <--- only fatal filesystem errors are handled
1383                     print("")
1384                     print(_(" %s while finalizing archive" % e.strerror))
1385                     print("")
1386                     self._exit(1)
1387             except:
1388                 if self.opts.debug: #<--- unless being run with --debug
1389                     raise
1390                 else:
1391                     return False
1392             finally:
1393                 os.umask(old_umask)
1394         else:
...
1408         if not self.opts.build:
1409             # compute and store the archive checksum
1410             hash_name = self.policy.get_preferred_hash_name()
1411             checksum = self._create_checksum(archive, hash_name)  #<--- exception
1412             self._write_checksum(archive, hash_name, checksum)

In the case where this was caused by the satellite plugin it was found that spacewalk-debug was attempting to archive 11G of log files when it timed out. Looking at the work that sosreport had completed so far may provide insight into why a particular plugin times out or fails.

  • Component
  • sos

This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.

Comments