How to distinguish between a crash and a graceful reboot in RHEL 7 or RHEL 8

Updated -

How can you distinguish between a system crash and a graceful reboot or shutdown in RHEL 7 or RHEL 8? This article outlines 4 approaches:

  1. Inspect wtmp with last -x
  2. Inspect auditd logs with ausearch
  3. Requires configuration: Create a custom service unit
  4. Requires configuration: Inspect previous boots in persistent systemd journal with journalctl

(1) Inspect wtmp with last -x

With a simple last -Fxn2 shutdown reboot command, the system wtmp file reports the two most recent shutdowns or reboots. reboot denotes the system booting up; whereas, shutdown denotes the system going down.

A graceful shutdown would show up as a reboot line followed by shutdown line, as in the following example:

~]# last -Fxn2 shutdown reboot
reboot   system boot  4.18.0-80.el8.x8 Mon Aug 31 06:33:11 2020   still running
shutdown system down  4.18.0-80.el8.x8 Mon Aug 31 06:33:01 2020 - Mon Aug 31 06:33:11 2020  (00:00)

Note: events from last are printed in descending chronological order, with most recent at the top.

An ungraceful shutdown can be inferred by the omission of shutdown; instead there will either be a single reboot line (if the wtmp file had been truncated/rotated prior to the crash) or 2 reboot lines in a row, as in this example:

~]# last -Fxn2 shutdown reboot
reboot   system boot  4.18.0-147.5.1.e Tue Sep  1 07:16:25 2020   still running
reboot   system boot  4.18.0-147.5.1.e Mon Aug  3 07:10:56 2020   still running

(2) Inspect auditd logs with ausearch

auditd is great and all the different events that it logs can be seen by checking ausearch -m. Apropos to the problem at hand, it logs system shutdown and system boot as above. The command ausearch -i -m system_boot,system_shutdown | tail -4 will report the 2 most recent shutdowns or boots. If this reports a SYSTEM_SHUTDOWN followed by a SYSTEM_BOOT, all is well; however, if it reports 2 SYSTEM_BOOT lines in a row or only a single SYSTEM_BOOT line, then the system did not shutdown gracefully.

Graceful shutdown:

~]# ausearch -i -m system_boot,system_shutdown | tail -4
----
type=SYSTEM_SHUTDOWN msg=audit(08/31/2020 06:33:01.571:595) : pid=27156 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success' 
----
type=SYSTEM_BOOT msg=audit(08/31/2020 06:33:12.838:9) : pid=828 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success' 

Note: as the timestamps should make clear, events from ausearch are printed in ascending chronological order, with oldest at the top.

Ungraceful shutdown:

~]# ausearch -i -m system_boot,system_shutdown | tail -4
----
type=SYSTEM_BOOT msg=audit(09/20/2016 01:10:32.392:7) : pid=657 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success' 
----
type=SYSTEM_BOOT msg=audit(09/20/2016 01:11:41.134:7) : pid=656 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success' 

Another ungraceful shutdown:
Presence of only one SYSTEM_BOOT record could be explained by the system being up for so long prior to the crash that audit logs of the previous reboot had been rotated out ... so that the only result is from when the system was just booted.

~]# ausearch -i -m system_boot,system_shutdown | tail -4
----
type=SYSTEM_BOOT msg=audit(09/01/2020 07:16:27.069:10) : pid=1057 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg=' comm=systemd-update-utmp exe=/usr/lib/systemd/systemd-update-utmp hostname=? addr=? terminal=? res=success'

(3) Create a custom service unit

Note: If you're trying to diagnose a potential crash right now, this will not help. You need to set it up first.

This approach is great because it allows for complete control. Here's an example of how to do it.

  1. Create a service that runs only at shutdown
    (Optionally customize the service name and the graceful_shutdown file)

    ~]# cat /etc/systemd/system/set_gracefulshutdown.service
    [Unit]
    Description=Set flag for graceful shutdown
    DefaultDependencies=no
    RefuseManualStart=true
    Before=shutdown.target
    
    [Service]
    Type=oneshot
    ExecStart=/bin/touch /root/graceful_shutdown
    
    [Install]
    WantedBy=shutdown.target
    
    ~]# systemctl daemon-reload
    ~]# systemctl enable set_gracefulshutdown
    
  2. Create a service that runs only at startup and only IF the graceful_shutdown file created by the above service exists
    (Optionally customize the service name and ensure the graceful_shutdown file matches the above service)

    ~]# cat /etc/systemd/system/check_graceful.service
    [Unit]
    Description=Check if previous system shutdown was graceful
    ConditionPathExists=/root/graceful_shutdown
    RefuseManualStart=true
    RefuseManualStop=true
    
    [Service]
    Type=oneshot
    RemainAfterExit=true
    ExecStart=/bin/rm /root/graceful_shutdown
    
    [Install]
    WantedBy=multi-user.target
    
    ~]# systemctl daemon-reload
    ~]# systemctl enable check_graceful
    
  3. Any time after a graceful reboot, systemctl is-active check_graceful would be able to confirm the previous reboot was graceful.
    Example output:

    ~]# systemctl is-active check_graceful && echo GOOD || echo BAD
    active
    GOOD
    
    ~]# systemctl status check_graceful
     check_graceful.service - Check if system booted after a graceful shutdown
       Loaded: loaded (/etc/systemd/system/check_graceful.service; enabled; vendor preset: disabled)
       Active: active (exited) since Tue 2016-09-20 01:10:32 EDT; 20s ago
      Process: 669 ExecStart=/bin/rm /root/graceful_shutdown (code=exited, status=0/SUCCESS)
     Main PID: 669 (code=exited, status=0/SUCCESS)
       CGroup: /system.slice/check_graceful.service
    
    Sep 20 01:10:32 a72.example.com systemd[1]: Starting Check if system booted after a graceful shutdown...
    Sep 20 01:10:32 a72.example.com systemd[1]: Started Check if system booted after a graceful shutdown.
    
  4. After a crash or otherwise ungraceful shutdown, the following would be seen:

    ~]# systemctl is-active check_graceful && echo GOOD || echo BAD
    inactive
    BAD
    
    ~]# systemctl status check_graceful
    ● check_graceful.service - Check if system booted after a graceful shutdown
       Loaded: loaded (/etc/systemd/system/check_graceful.service; enabled; vendor preset: disabled)
       Active: inactive (dead)
    Condition: start condition failed at Tue 2016-09-20 01:11:41 EDT; 16s ago
               ConditionPathExists=/root/graceful_shutdown was not met
    
    Sep 20 01:11:41 a72.example.com systemd[1]: Started Check if system booted after a graceful shutdown.
    

(4) Inspect previous boots in persistent systemd journal with journalctl

Note: If you're trying to diagnose a potential crash right now, this will not help unless you have previously configured systemd to persist the journal to disk.

  1. Configure systemd-journald to keep a persistent journal on-disk
    Either update /etc/systemd/journald.conf or create the dir yourself as follows

    # Create standard log dir and fix ownership/perms
    ~]# mkdir /var/log/journal; systemd-tmpfiles --create --prefix /var/log/journal 2>/dev/null
    
    # Next: tell systemd to flush the current journal to disk
    ~]# systemctl -s SIGUSR1 kill systemd-journald
    
    # OPTIONAL: reboot not required other than to give the following commands more than one boot to inspect
    ~]# reboot
    
  2. Optionally use journalctl --list-boots to get a list of boots in ascending chronological order
    0 refers to current runtime logs since the system was booted; -1 covers logs from the previous boot; -2 the boot before that, etc
    Example:

    ~]# journalctl --list-boots
    -2 e1dbd8f133f643d1a816605d96f3ca07 Fri 2020-03-27 22:31:25 UTC—Thu 2020-05-14 01:02:51 UTC
    -1 1969253689e842deaea06ca32f4650c7 Thu 2020-05-14 01:10:00 UTC—Thu 2020-06-04 08:29:42 UTC
     0 26a4a2ff48594778850d917a7e2ad195 Tue 2020-09-01 07:16:20 UTC—Tue 2020-09-01 19:11:20 UTC
    
  3. Use journalctl -b -1 -n to look at the last 10 lines of the previous boot
    The following example output shows that the previous system reboot was graceful

    ~]# journalctl -b -1 -n
    -- Logs begin at Tue 2016-09-20 01:01:15 EDT, end at Tue 2016-09-20 01:21:33 EDT. --
    Sep 20 01:21:19 a72.example.com systemd[1]: Stopped Create Static Device Nodes in /dev.
    Sep 20 01:21:19 a72.example.com systemd[1]: Stopping Create Static Device Nodes in /dev...
    Sep 20 01:21:19 a72.example.com systemd[1]: Reached target Shutdown.
    Sep 20 01:21:19 a72.example.com systemd[1]: Starting Shutdown.
    Sep 20 01:21:19 a72.example.com systemd[1]: Reached target Final Step.
    Sep 20 01:21:19 a72.example.com systemd[1]: Starting Final Step.
    Sep 20 01:21:19 a72.example.com systemd[1]: Starting Reboot...
    Sep 20 01:21:19 a72.example.com systemd[1]: Shutting down.
    Sep 20 01:21:19 a72.example.com systemd-shutdown[1]: Sending SIGTERM to remaining processes...
    Sep 20 01:21:19 a72.example.com systemd-journal[483]: Journal stopped
    

Note from the author: In my experiences troubleshooting RHEL 7 problems for customers in Red Hat support (in the years leading up to 2016 when I wrote this article), this was somewhat less reliable than the other methods. When bad things happen, it was definitely possible for the indexing in journald to get so bad that the journalctl -b -1 command only gives an error. I'm unsure if this has been improved in later versions of RHEL 7 and RHEL 8.

Comments