Sosreport fails during archive compression on RHEL
Environment
- Red Hat Enterprise Linux (RHEL) 6
- Red Hat Satellite Proxy 5.7.0
- sos-3.2-54.el6
Issue
-
Sosreport fails after satellite plugin timed out
Running 58/81: rpm... Running 59/81: sar... Running 60/81: satellite... [plugin:satellite] command 'spacewalk-debug --dir /var/tmp/sos.1234/sosreport-test-xxx/sos_commands/satellite/spacewalk-debug' timed out after 300s Running 61/81: scsi... ... Running 81/81: yum... Creating compressed archive... Traceback (most recent call last): File "/usr/sbin/sosreport", line 25, in <module> main(sys.argv[1:]) File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 1520, in main sos.execute() File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 1499, in execute return self.final_work() File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 1411, in final_work checksum = self._create_checksum(archive, hash_name) File "/usr/lib/python2.6/site-packages/sos/sosreport.py", line 1349, in _create_checksum archive_fp = open(archive, 'rb') IOError: [Errno 2] No such file or directory: '/var/tmp/sos.1234/sosreport-test-xxx/sosreport-test-xxx.tar.xz'
Resolution
- The timeouts observed in the satellite
sos
plug-in can be resolved by updating tosatellite
5.8 RHEA-2017:1557. - Update
sosreport
tosos-3.2-63.el6
released with Advisory RHBA-2018:1920 or newer, including following enhancements:sosreport
now detectsspacewalk-backend
version and skipspacewalk-debug
collection if older than the fixed one (spacewalk-backend-2.5.3-143
), tracked by bug 1496288GAsosreport
increased timeout forspacewalk-debug
, tracked by bug 1439943
Root Cause
The satellite spacewalk-debug command is attempting to archive a huge volume of data ( > 10G) while running spacewalk-debug, and is timeing out, causing sosreport to fail.
Diagnostic Steps
sosreport has detected that the archive was missing when attempting to calculate an its checksum (md5sum), but the issue occured prior to that while the arcive was being compressed.
/usr/lib/python2.6/site-packages/sos/sosreport.py
50 # file system errors that should terminate a run
51 fatal_fs_errors = (errno.ENOSPC, errno.EROFS)
...
1372 # package up and compress the results
1373 if not self.opts.build:
1374 old_umask = os.umask(0o077)
1375 if not self.opts.quiet:
1376 print(_("Creating compressed archive..."))
1377 # compression could fail for a number of reasons
1378 try:
1379 archive = self.archive.finalize(
1380 self.opts.compression_type)
1381 except (OSError, IOError) as e:
1382 if e.errno in fatal_fs_errors: # <--- only fatal filesystem errors are handled
1383 print("")
1384 print(_(" %s while finalizing archive" % e.strerror))
1385 print("")
1386 self._exit(1)
1387 except:
1388 if self.opts.debug: #<--- unless being run with --debug
1389 raise
1390 else:
1391 return False
1392 finally:
1393 os.umask(old_umask)
1394 else:
...
1408 if not self.opts.build:
1409 # compute and store the archive checksum
1410 hash_name = self.policy.get_preferred_hash_name()
1411 checksum = self._create_checksum(archive, hash_name) #<--- exception
1412 self._write_checksum(archive, hash_name, checksum)
In the case where this was caused by the satellite plugin it was found that spacewalk-debug was attempting to archive 11G of log files when it timed out. Looking at the work that sosreport had completed so far may provide insight into why a particular plugin times out or fails.
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form.