How to check if a host needs to reboot?

Latest response

Hello Community,

I'd like to know how to figure out if a host needs to be rebooted after the installation of updates. Is there any way to determine wether a reboot is needed or not?

By know I could use the following script to create a file if a reboot is necessary:

#!/bin/bash
LAST_KERNEL=$(rpm -q --last kernel | perl -pe 's/^kernel-(\S+).*/$1/' | head -1)CURRENT_KERNEL=$(uname -r)

if [[ ! $LAST_KERNEL = $CURRENT_KERNEL ]];
then
  touch /var/run/reboot-required
fi

But I'm not sure if a new kernelversion is the only reason that requires a reboot.

On SLES there is the command zypper ps -s to check for running processes which use deleted files. Is there a similar mechanism in RHEL? Or are there no running processes which use deleted files because these are reloaded every time an update changes files used by some process?

Kind regards,
Joerg

Responses

Hello Jorg,

Replacing executable files that are in use by processes is prohibited (text file busy). When an application is upgraded, it is stopped by the pre-install script of the rpm and started by the post-install script of the rpm. This should ensure that the application will not keep using deleted files.

You could use lsof (http://unix.stackexchange.com/questions/182077/best-way-to-free-disk-space-from-deleted-files-that-are-held-open) to check for processes using deleted files.

Regards, Siem

Hello Siem,

Thank you for your answer. If i got it right and the replacement of executables in use is prohibited there is no reason left, why a host needs a reboot than a kernel update, right?

Regards, Joerg

Library files loaded in memory is one of the major reasons you need to reboot. In some cases you can't stop all processes using these shared library (glibc is a prime example of this) so a reboot is required.

There are technologies available to patch kernels in memory (live), and there are also ways to patch some user space libraries live now without reboot (eg. glibc) but I don't believe these are offered/supported by Red Hat.

lsof can give you information about files that are loaded that don't match their on disk counterpart, this is a good indicator that you need to restart services/processes or potentially reboot the server.

A good start to determining this is:

lsof -nP | grep '(deleted)'

The link provided below by Patty has a more complete list of packages you need to reboot for: https://access.redhat.com/solutions/27943

Keep in mind even if an update of a package restarts a service, other services/processes may still be using the old shared libraries for that package and will need restarting manually.

Hello Joerg,

Not entirely. There are exotic cases like (https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_Guide/sec-sel-fsrelabel.html) relabeling a file system.

Regards, Siem

Ah, thanks again. :-)

According to this link, a reboot is required for more than just replacing the kernel: https://access.redhat.com/solutions/27943

Siem said:

"Replacing executable files that are in use by processes is prohibited"

The above statement isn't accurate or applicable to the problem at hand. Here are some points for consideration:

When you execute a command or when a process loads in a library, it pages the full contents into memory. The state of the original binary/script/whatever matters not. When you update an rpm, the files on disk change immediately. That does nothing to the running processes.

In modern versions of RHEL, many packages that provide services restart said services in the rpm scripts. Two examples: both openssh-server & httpd do service xxx condrestart (or systemctl try-restart xxx). Some of these (like httpd) check a file to decide whether to do the restart (allowing for you to disable the behavior if desired). That said, if an rpm doesn't auto-restart its daemon, you need to do it manually to take advantage of the updated binary. Processes spawned from other simple executables (like bash) will also need to be shutdown.

If you update a package that provides libraries (say openssl or glibc) to address a bug or security vulnerability, you must restart all applications & services that are using that library -- if you actually want to fix the bug or security vulnerability in question. The simpler approach is almost ALWAYS to reboot the system, but depending on what is the issue being fixed, you could alternatively figure out which applications might be affected by the library bug/vulnerability and then spend energy tracking down and restarting only the relevant ones.

At this point I should probably point out that you can also use the needs-restarting command from the RHEL6/RHEL7 yum-utils package to check for things that might need to be restarted and just go for the shotgun approach, making sure you restart everything you find in the output. Unfortunately this tool is pretty basic and can be quite hard to use on busy production systems. Here's an example of me running it on a clean and simple RHEL6 system after updating glibc.

[root@r67 ~]# needs-restarting 
1 : /sbin/init
559 : /sbin/udevd-d
1187 : /sbin/dhclient-Hr64.example.com-1-q-cf/etc/dhcp/dhclient-eth0.conf-lf/var/lib/dhclient/dhclient-eth0.leases-pf/var/run/dhclient-eth0.pideth0
1231 : auditd
1247 : /sbin/rsyslogd-i/var/run/syslogd.pid-c5
1276 : irqbalance
1289 : dbus-daemon--system
1318 : /usr/sbin/acpid
1327 : hald
1328 : hald-runner
1356 : hald-addon-input: Listening on /dev/input/event2 /dev/input/event0
1364 : hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
1391 : ntpd-untp:ntp-p/var/run/ntpd.pid-g
1467 : /usr/libexec/postfix/master
1481 : qmgr-l-tfifo-u
1491 : /usr/sbin/abrtd
1499 : crond
1510 : /usr/sbin/atd
1523 : /usr/bin/rhsmcertd
1537 : /sbin/agetty/dev/ttyS0115200vt100-nav
1539 : /sbin/mingetty/dev/tty1
1541 : /sbin/mingetty/dev/tty2
1543 : /sbin/mingetty/dev/tty3
1545 : /sbin/udevd-d
1546 : /sbin/mingetty/dev/tty4
1547 : /sbin/udevd-d
1549 : /sbin/mingetty/dev/tty5
1551 : /sbin/mingetty/dev/tty6
1552 : sshd: root@pts/0
1554 : -bash

So at least that gives you an idea. It would take some work to clear that up without a reboot, but it's doable.

[root@r67 ~]# for s in auditd rsyslog irqbalance messagebus acpid haldaemon postfix ntpd abrtd crond atd rhsmcertd network; do service $s restart; done
Stopping auditd:                                           [  OK  ]
Starting auditd:                                           [  OK  ]
Shutting down system logger:                               [  OK  ]
Starting system logger:                                    [  OK  ]
Stopping irqbalance:                                       [FAILED]
Starting irqbalance:                                       [  OK  ]
Stopping system message bus:                               [  OK  ]
Starting system message bus:                               [  OK  ]
Stopping acpi daemon:                                      [  OK  ]
Starting acpi daemon:                                      [  OK  ]
Stopping HAL daemon:                                       [  OK  ]
Starting HAL daemon:                                       [  OK  ]
Shutting down postfix:                                     [  OK  ]
Starting postfix:                                          [  OK  ]
Shutting down ntpd:                                        [  OK  ]
Starting ntpd:                                             [  OK  ]
Stopping abrt daemon:                                      [  OK  ]
Starting abrt daemon:                                      [  OK  ]
Stopping crond:                                            [  OK  ]
Starting crond:                                            [  OK  ]
Stopping atd:                                              [  OK  ]
Starting atd:                                              [  OK  ]
Stopping rhsmcertd...                                      [  OK  ]
Starting rhsmcertd...                                      [  OK  ]
Shutting down interface eth0:                              [  OK  ]
Shutting down loopback interface:                          [  OK  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:  
Determining IP information for eth0... done.
                                                           [  OK  ]
[root@r67 ~]# killall mingetty
[root@r67 ~]# killall agetty
[root@r67 ~]# exit
...
[root@r67 ~]# needs-restarting 
1 : /sbin/init
465 : /sbin/udevd-d
1526 : /sbin/udevd-d
1525 : /sbin/udevd-d
[root@r67 ~]# killall udevd
[root@r67 ~]# start_udev 
Starting udev:                                             [  OK  ]
[root@r67 ~]# needs-restarting 
1 : /sbin/init
[root@r67 ~]# init u
[root@r67 ~]# needs-restarting 
1 : /sbin/init

So with some work, I was able to get everything taken care of except upstart (init) ... which may or may not still require a reboot. Hard to say. (The telinit man page suggests you can re-exec with "u" but that didn't change what needs-restarting saw and I didn't look into what it actually does or why needs-restarting was tagging it.) Again though: keep in mind that this demo was almost a best-case scenario with a minimal install and no applications or users running.

Moving on to RHEL 7 ... systemd-based platforms offer more capability to tie processes to services and users cleanly, but unfortunately Red Hat doesn't provide any supported tools that I'm aware of. (In Fedora, I use the amazing Tracer which can even be plugged-in to dnf.)

In RHEL7 you could use the completely-unsupported rockin' needs-restart.pl script from RMJ, one of our own awesome engineers. On a minimal and clean RHEL7 system where openssl was just updated, you'd see something like the following. Standard needs-restarting output first.

[root@r71 ~]# needs-restarting 
7324 : /usr/sbin/httpd -DFOREGROUND 
587 : /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid 
7326 : /usr/sbin/httpd -DFOREGROUND 
7327 : /usr/sbin/httpd -DFOREGROUND 
601 : /usr/bin/python -Es /usr/sbin/tuned -l -P 
1430 : pickup -l -t unix -u 
1431 : qmgr -l -t unix -u 
1429 : /usr/libexec/postfix/master -w 
1261 : /usr/sbin/sshd -D 
7328 : /usr/sbin/httpd -DFOREGROUND 
7329 : /usr/sbin/httpd -DFOREGROUND 
7325 : /usr/sbin/httpd -DFOREGROUND 
1461 : sshd: root@pts/0     
1351 : /sbin/dhclient -d -q -sf /usr/libexec/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-4ff5ce21-4415-4007-847a-6e4490470dd0-eth0.lease -cf /var/lib/NetworkManager/dhclient-eth0.conf eth0 

[root@r71 ~]# needs-restart.pl 
In order to complete the installation of openssl-libs-1.0.1e-51.el7_2.5.x86_64,
you should restart the following services:

    - firewalld.service - firewalld - dynamic firewall daemon
    - postfix.service - Postfix Mail Transport Agent
    - NetworkManager.service - Network Manager
    - httpd.service - The Apache HTTP Server
    - tuned.service - Dynamic System Tuning Daemon
    - sshd.service - OpenSSH server daemon

In order to complete the installation of openssl-libs-1.0.1e-51.el7_2.5.x86_64,
you should tell the following users to log out and log in:

    - session-1.scope - Session 1 of user root

In the above RHEL7 example, after I did a systemctl restart firewalld postfix NetworkManager httpd tuned sshd command and logged out of my ssh session (and logged back in), all was well.

As for packages that require a full reboot in order to use ... well there is of course the kernel and associated packages. As discussed above, glibc can also be a pretty big deal, though it's totally possible to get all processes [that are using it] to restart without rebooting. As always, init might be problematic; however, systemctl has a daemon-reexec command that the systemd rpm scripts kick off when upgrading. I did a quick test, upgrading a stock RHEL 7.1 system to the latest systemd package (with dependencies of course) and after restarting all the services I could and logging out, I ended up with this:

[root@r71 ~]# needs-restart.pl 
[root@r71 ~]# needs-restarting 
1 : /usr/lib/systemd/systemd --system --deserialize 20 
1153 : /usr/lib/systemd/systemd-udevd 
[root@r71 ~]# systemctl restart systemd-udevd.service
[root@r71 ~]# needs-restarting 
1 : /usr/lib/systemd/systemd --system --deserialize 20 
[root@r71 ~]# systemctl daemon-reexec 
[root@r71 ~]# needs-restarting 
1 : /usr/lib/systemd/systemd --system --deserialize 25 

So again, like RHEL6, init still shows up with the needs-restarting tool. However, the systemctl(1) man page inspires more confidence than that of RHEL6's telinit. Excerpt:

daemon-reexec
 Reexecute the systemd manager. This will serialize the manager state,
 reexecute the process and deserialize the state again. This command
 is of little use except for debugging and package upgrades.
 Sometimes, it might be helpful as a heavy-weight daemon-reload. While
 the daemon is being reexecuted, all sockets systemd listening on
 behalf of user configuration will stay accessible.

So I'm inclined to believe that you don't actually need to do a reboot after updating systemd despite what yum-util's needs-restarting says.

Regarding the KCS mentioned by Patty: I know that prior to the release of RHEL7 there was an internal review process to overhaul the technical contents, if not the poor formatting and presentation. It's the only authoritative answer Red Hat has at the moment. ... And in my opinion, it's clearly coming more from a perspective of "what do we want to suggest and support" as it contradicts what I've demonstrated above.

In the end, should you jump through these hoops on busy resource-strapped production systems in order to avoid reboots?? ... It's up to you. If we're talking about a single unique mission-critical system, then my answer would be "no"; instead, schedule the downtime. If we're talking about cluster nodes or application servers behind a load balancer or something else of that sort, then "SURE" if you want to.

In any case, I'm just one person. Don't take my thoughts on any of this as authoritative. Decide for yourself.

Hello Ryan,

You said: "When you execute a command or when a process loads in a library, it pages the full contents into memory. The state of the original binary/script/whatever matters not. When you update an rpm, the files on disk change immediately. That does nothing to the running processes."

How about the following? On a RHEL 7.2 system I execute the /bin/sleep command to make sure it is busy and then attempt to overwrite the file in another terminal:

# cp /bin/sh /bin/sleep
cp: overwrite '/bin/sleep'? y
cp: cannot create regular file '/bin/sleep': Text file busy

How could an rpm change this file as long as the process is running?

Regards, Siem

rm

Yes. Should have thought about that.

[root@r67 ~]# start_udev 

Never ever start udev with start_udev ... better do

# udevd -d

I'm not sure what you're talking about here Harald. Can you say more? I haven't done any research on this; I'm just going based off what the startup scripts do. The udevd -d command doesn't seem to do anything at all, in contrast to what I demonstrated.

start_udev does a lot more, than just starting the udev daemon. It creates device nodes and retriggers the whole system, which will lead to unforseen side effects.

Do you have an alternative?

-- delete