Recommendations on Patch Management for RHEL Servers that haven't been patched in years
So I'm in the final phases of ingesting iso updates for our RH Satellite Server and adding RHEL Servers to it.
The next level is the ugly reality of catching up our servers from years of missing patches so they are secure and improve performance.
I'm aware of change management, patching during off hours, communicating with the stake-holders about patches, and making snapshots from VMWare/backups. However I've not been in a situation like this before where the servers have been unpatched for years and years.
It would be really dumb patch all at once, and it would cause issues. My question is does anyone here any recommendations on how to move forward with a daunting task like this?
thanks
Responses
When I hear "haven't been patched in years", my first question for someone seeking to change that is, "wouldn't it make more sense to upgrade them to a more recent OS release" (on the assumption that if it's literally been years since they were patched, it's likely been even longer since they were built and that they were probably built with a "not latest" version to begin with.
The other thing is, if the systems haven't been owned yet, you've probably got something else at play protecting them (even if it's just a protected topology). If such is the case, it's not like failure to patch or rebuild right freaking now is likely to make a ton of difference in their relative security.
Chris,
I suspect from your post that you have a satellite server, probably version 5.x. Satellite version 5.5 is no longer in support. I'd highly recommend (if you have not already) upgrading your satellite server at some soon point to version 5.7, perhaps even after you initially patch your current attached clients first.
If you wanted to be conservative, I'd patch non-production servers first, monitor the servers for at minimum a few days or a week (check differences in logs, performance, function etc). Perhaps do some non-production servers, then see how it goes after a couple of days and do the remainder. Then after non-production is completed, do some production at one point, then the other production servers at another point (some places have production servers running in pairs so if one server in a node goes down, the other one is there to sustain services). Examine roles/functions of given servers. Have a set of reasons why you selected to patch whatever servers first, and subsequent patching events, and the progress involved. Document this and the progress, particularly during a major upgrade can help introduce some reason into discussions with stakeholders, customers etc. Communicate relevant down time to stakeholders, customers etc as needed.
Have your database personnel available when you do this. Be prepared for ASMLib with oracle databases on RHEL as well.
- I concur with Tom's bit above, upgrade with an immediate patch plan that will allow you to maintain production services and a plan that does not cause consternation (give time, think it through, have application developers on hand if needed, to resolve their application issues after Operating System upgrade if it is required)
-
You could for now as a temporary means use whatever current satellite server you have to ingest your channels manually (I wrote a guide on manual ISO channel ingesting here in the discussion area https://access.redhat.com/discussions/1402593 )
-
Make sure to delete any snapshots of your VMware satellite server before ingesting base or incremental ISO channels!!! or you will fill up your storage pool even if the satellite server believes no partitions are at 100%
-
There are times where I have inherited a Red Hat server that was numerous sub-releases (the number to the right of the period, such as the "2" in "6.2") or 5.2 and I brought one such server all the way to 5.11 (or 5.current, or 6.current) after inheriting it from a different contract.
- If the server in question is VMware, see if it is plausible to perform a snapshot prior to the upgrade (first check to see if another snapshot exists)
- If the server is VMware, make sure that numerous other old snapshots do not exist because on the VMware cluster, the storage pool for any given server can fill up that storage pool even if the VMware server itself does not appear to be full with the use of 'df' commands on the guest virtual Red Hat system.
- If you are using VMware systems, make sure the "underlying hardware" firmware and "vmware tools" is readily available. Often when we upgrade certain servers, we have to concurrently upgrade the VMware tools, else the network connection will drop Have "gcc" loaded along with kernel-headers and kernel-devel , and when you do the vmware tools installation, use the "-default" (only one dash) at the end to save yourself from hitting "enter" some 17 times (do it via command-line because sometimes a server will silently fail and it can happen enough times to be highly annoying).
- If you do not have a "crash cart" set up for your physical systems, set one up now, and test it. You will need a keyboard, mouse (for some modern bios programs like current Dell, or Orale (Sun)), and a monitor with both digital and vga connectors (we have servers that also have digital monitor connectors).
- Make sure you can use remote administration, remote consoles prior to upgrading a given server (ILOM/IDRAC etc) If you have an "Integrated Lights Out Manager" (ILOM), or some remote method such as Dell's "iDRAC", make sure it is properly configured, networked, and usable from your remote workstation prior to upgrading so you can rescue it as possible
- Have a boot ISO of Red Hat (both physical and on your VMWare system) accessible so if/when you need to do a boot rescue, you can.
- Know that if you have to perform a rescue of a server running SElinux, then after you mount it, and reboot the server - it will undergo a mandatory relabeling.
- Check the uptime on the servers in question to see if it is "excessive" While administrators pride themselves on long amounts of uptime (and rightfully rub it in the face of the Other Operating System), it can bite you in the tail if you have a large raid array or SAN that will require a mandatory fsck because it has not been rebooted in 8 months or two years.
- Verify disk mounts prior to a reboot in case a mount is non-functioning. Use labels, UUIDs or LVM with drive mounts to avert device scrambling and non-mounted drives where possible.
-- Reboot said servers and carefully monitor the servers for fsck activity as needed PRIOR to the upgrade so you are not paying the price of a mandatory fsck along with a concurrent upgrade, then being faced of what "broke the server" explanations or diagnosis - You might want to consider upgrading the firmware on your servers where applicable
-- Document your work, this is atypical of many administrators, but in such a project, it can help with some clarity when explaining things to customers or competing contracts etc.
- IMPORTANT -- Make sure your /boot partition on any given server is not close to being full from previous kernel updates or whatever, see below...
-- I had to help someone whose /boot partition was full and when they attempted a kernel update, /boot went to 100% usage half way through the process and their box suffered consternation and I spent quite a while restoring that system with a non-trivial amount of work. It's good to avoid this, simply take steps to ensure /boot is not in danger of filling up and take steps to avert in advance.
- Again, I'd highly recommend upgrading your satellite server to at minimum Satellite 5.7, perhaps after you upgrade all the clients. I'd consider creating a clean new satellite and migrate your systems to it. You can contact Red Hat support to settle the temporary satellite certificate in order to have two satellite servers running concurrently. Note the partitioning differences, read through the installation procedure several times. Do not join the satellite to LDAP or any sort of authentication until AFTER the satellite is up/running to avert specific accounts from being made (trust me).
Security is more than mere patches (understatement), if you are affiliated with the DoD, consider an ACAS server, and a STIG posture, using AIDE among other factors.
If you don't need the automation component, the openscap utilities can be run locally. That will at least give you a baseline of your current systems' hardening states. Utility's fairly trivial to run ...you just need to know which SCAP profile is most appropriate for your environment. The RPMs come with several generic ones or, if you're looking for the DoD-specific benchmarks, you can get them from DISA's Benchmarks Portal's contents for EL5 and for EL6.
Yeah, not as nice to run it manually on each host, but it gets you data while you decide on your upgrade path for your Satellite deployment.
Chris,
Hope it helps... I had suspected you might be on Satellite version 5.5 which is no longer supported. If you are on Satellite 5.6, then it's still supported, and if you're happy, cool.
That being said, I found that there are rpms on disconnected satellite servers that are never updated, namely the spacewalk-java set amongst others. These rpms were in the original installation ISO that you use to load the initial load of the satellite server.
And on disconnected satellites (we have eight of them, all disconnected), it started breaking tomcat, and caused other consternation. I chronicled the issue somewhat at this link https://access.redhat.com/discussions/1218823 and later as a result of much digging between I and in that discussion I started, I found a partial work-around (the partial work around is posted in that discussion), but no real fix for orphaned rpms that are never updated in ANY ISO channel we download.
Red Hat eventually sent us a custom channel by mail for spacewalk-java, and I have a "feature request" in to mitigate this issue that we only discovered because our ACAS server noticed these rpms (see discussion above), needed to be resolved.
In my spare time I plan on checking the rpms in my satellite 5.7 servers (we're going to all satellite 5.7, nearly completed) to see if they are orphaned as well. We went to satellite 5.7 mostly over the consternation we experienced in attaining spacewalk-java updates through mailed channels and corrupted iso images whose md5 sums were wrong, among other issues. We will probably go to satellite 6.x on our primary customer sometime soon now that ISO channels are being made available for satellite 6.x.
Anyway, the ACAS server you can attain through your security channels is actually a better product than many, contact your IA office about it, they have to request it through their channels.
Check the discussion area for STIG discussions such as this for rhel 6 (and comments) as a starting point, and also some RHEL 7 STIG discussion is here (see the comments too), but we do not have RHEL 7 approved yet for our environments, but are working on our own STIG load for RHEL 7 anyway, with what is available.
OpenSCAP is available in 5.7, and oddlly, I do not find OpenSCAP listed in Satellite 6.x documentation. I'm glad you brought that up, I plan on asking Red Hat about OpenSCAP in Satellite 6.x
I just found this:
This Red Hat solution id# 724333 says that OpenSCAP is currently on the roadmap targeted for sometime afterwards Satellite 6.0 GA.
So it might be good in your case to wait until you see it in the documentation or release notes of the next minor revision
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
