Replacing CentOS/Scientific Linux with Red Hat, the politics not the technical
This one may be a touchy subject on the Red Hat discussion pages but I am interested to know people's thoughts and approaches to the following.
You are an administrator / system engineer in a medium/large environment that is running a CentOS/Scientific Linux as their chosen Linux distribution. What 'sales pitch' do you give them to convince them that the expense of Red Hat licenses is worth the outlay?.
I have personally been in this situation and managed to convince one customer with the argument of central management with Satellite and corporate support for vendor supplied applications. Now though, Spacewalk is available for CentOS users and with Red Hat putting their weight behind the CentOS project it almost legitimises the argument to stay with CentOS to a CIO trying to save money. I am also finding vendors that are happy to support deployments of their application on CentOS, so the argument gets that little bit more difficult.
Without "call in the Red Hat sales team", how would you approach the subject with the decision makers in an organisation you are responsible for?
Have you been through the process as well?
What were your experiences?
Responses
This is a very good/interesting dichotomy that I, too, would be keenly interested in hearing on community experiences and advice. I'm coming in to the "realm of Red Hat" a little more "outside-in" than most I think (mainly working with clients either on purely OpenStack configs or on desktop applications working backwards to network/server needs and requirements, while most licensed rhel installs are legacy Server OS that now have a potentially enterprise-viable and natural extension to the desktop). I am hoping to really position and push RHEL to clients for essentially across-the-board installations [OpenStack/Server/Desktop (Workstation)], so the CentOS discussion is one I will certainly face increasingly... I realize I'm not really adding anything useful to the original post here, but as a small "bump" of the question I am really interested in opinions here as well - this is undoubtedly going to be a major, recurring conversation all of us will have with clients: as CentOS is now Red Hat-backed, why license RHEL/move away from CentOS (either from pre-existing installs or targeted/planned ones)?
Right now I have had the advantage that most of my clients are engaging me primarily for OpenStack-related subscriptions with Red Hat, so it provides at least a small positioning with clients that extending Red Hat globally is a logical extension/natural progression in constructing a rather homogeneous infrastructure (this has to be counter-balanced of course with concerns over "vendor lock-in"). At least right now (and I admit this as being an early, rather uneducated/needs-more-direct-experience opinion of mine), if the primary argument for RHEL subscription/licensing remains solely on Support & Maintenance/Upgrade Paths, then RH has a ways to go in justifying that expense in C-level conversations.
Pixel,
some initial thoughts, certainly not exhaustive/complete.
I suspect you're fine with supporting Linux. It seems you have more than sufficient experience I suspect (probably an understatment) to go without support. That being said, some discussions I've seen on this are if there is a time your customer does need support, balance the price of support costs to the price of how much cost in downtime going to impact the operation? And if for some reason the company in question loses their highly experienced person who can support issues/outages goes away, there may be a lag with getting another person on board especially if they require a corporate or other background check etc/relocation etc.. It is one thing for Red Hat to embrace CentOS; however, if support is desired, this is obviously a separate factor. If some entity goes with a clone of RHEL, and perhaps use open source (non-centos versions of perhaps some software, for example), then that complicates acquiring support if it is really needed.
Another consideration - security. It seems the centos folks is a community supported bit and sometimes (I've read) their security patches lag. (LWN.net seems to post when centos releases security patches.) If that perceived or actual delay is a concern, then that could be another argument. It seems the response for security is faster with Red Hat.
Some of the developers in my environment are always eager for the latest version of some rpm that is supported with an enterprise version of Linux, especially with a new release (granted, this is not a consistent/day by day issue), but it seems there is a delay for centos (I think I read a 200-day delay for centos 6, correct me if I am wrong), from when Red Hat publishes a major release and the time it takes the centos folks to follow suit. That being said, the recent cooperative agreement Red Hat has seemed to make with centos may to some degree mitigate that. Appaerntly, bugfixes do not necessarily correspond between rhel and centos, see previous sentence.
From what I've read, with centos, when version v6.5 hit the streets, all support (someone correct me if I'm wrong here), in terms of patches stopped for v6.4. however, with RHEL, you can still get security updates (provisionally, with a specific subscription). So if you really needed to stay at 6.4 (for example) for some compelling 3rd-party reason, that could be a consideration. Scientific Linux might do patches that reach back to other versions (I do not know).
Lastly, some of us had not seen the availablity of Red Hat Sofeware Collections until very recently, if the Red Hat Software collections is anything of interest, then that could be another consideration. That being said, I suspect much of that is of course availabe open source. The difference, at Red Hat's page on their Software Collections bit, they say you can apparently have more than one version of this software on a RHEL system.
Okay, one more... Red Hat Satellite is about to release version 6 (satellite version six). It will be interesting to see how spacewalk compares in light of this. I have not followed spacewalk, but a coworker as a test built one that seemed not that dificult as a test (that we did not continue to pursue). You might like the features in version 6, that it has Puppet, however, you've already been using puppet manifests with your 5.x satellite servers and would likely not be constrained if you had to take a divergent route. However, satellite's current version and next version and spacewalk may be something to consider.
The above certainly is not exaustive and I have probably only scratched the surface and there are certainly other topics on this to list I am sure. There are obviously ways to mitigate concerns from the above and alternatively pros/cons to consider with the above and the reasons I or others have not yet posted.
A lot of the weight behind your arguments will depend on how much you rely on vendor support for things. In the environment I work in, vendor support is pretty much a non-issue: we're generally not able to use it even if it is available and the in-house expertise usually leaves the vendor support efforts look wanting (talking all of the "RedHat-only" hosted applications, not RedHat in particular).
Because of this, we've been given directive from the top down to move to CentOS wherever possible. That means that, as our Solaris and RHEL 5 systems age off, the universal directive is "move to CentOS 6". The only exceptions to this, thus far, have been:
- systems that leverage the more advanced features of Oracle (i.e., ASM)
- applications that embed other vendors components. For example, our NetBackup master servers run RHEL while our media servers run CentOS. NetBackup master server uses an embedded SQLAnywhere database. Symantec won't assume the risk of running it on CentOS since their upstream hasn't blessed that, yet. However, they will support NetBackup media servers on CentOS since there's no upstream components to worry about.
Ours is a hosted computing environment. Tenants that want full infrastructure support and don't want to pay additional, individually-licensed software costs use our CentOS offering. tenants that require RHEL pay for their own RHEL licensing and handle their own sustainment.
With respect to Satellite 6 vs. Spacewalk: that wasn't really an relevant argument either. We didn't use Satellite or Spacewalk, previously. We used a third-party provisioning, sustainment and scripting framework. As we began the move to CentOS, Satellite remained a non-issue: whether we used Satellite to sustain CentOS or RHEL, the (non-trivial) cost to us would have been significantly similar.
Our particular use-case aside, it's important to know what goes into the RedHat solutions. SpaceWalk was the upstream version of Satellite 5. Satellite 6 uses a different set of upstream components (look at the Satellite 6 documents for the sub-components). From the Katello FAQ:
A.) The Spacewalk project will continue to be the upstream community project for the Satellite version 5 offering. The Katello project is now an upstream component for the next generation release of Satellite, yet to be released into market. Stay tuned for updates and announcements on how Katello will contribute to the next generation Satellite offering.
Basically, if you feel like home-brewing, you can tie Katello/ForeMan/Puppet together to get similar functionality to Satellite 6 (just may not be as polished). The question becomes, "is it more cost-effective to use the packaged offering of Satellite 6 or undertake the engineering effort to integrate the upstream components". You had to do home-brewing with Satellite if you wanted errata support. We're still evaluating this as a longer-term option.
To be honest, when you look how hard RedHat's pushing their virtualization stack and the endorsement of CentOS (which only further solidified management's mind-set), it gives the appearance that RedHat's focus is on making turnkey OpenStack rather than relying on RHEL for income. Factor in that CentOS's lag has been significantly less with the 6.x point updates (versus 5.x), and for-pay RHEL was simply not acceptable for widescale deployment option to our management.
Thanks for mentioning Katello, as I mentioned I have not followed spacewalk. It looks like my environment is going to stay with RHEL/RH products given a variety of factors...
Some factors that go beyond mere support that I mentioned have to do with the apparent reaction speed difference from security patches, and the apparent disparity with bugs, and speed of availability of new releases. That might not be a concern, but it is something to be aware of for those crossing the bridge. However, as I mentioned, RH and CentOS coming together might mitigate that, but time will tell.
The 'arguments' I brought up were more akin to considerations to be prepared for one way or another or to think over, ergo my last paragraph, last post.
Kind regards...
Following the alternatives wasn't an option. It's the technical requirement for meeting the already-made business decisions.
The RedHat versus CentOS decision is usually an acquisition costs versus engineering/operations costs argument. Acquisition costs are easy to quantify. Engineering and operation costs (over time) are much more difficult to quantify. If someone makes the argument "we can cut our acquisition costs by $XMn over five years by switching to CentOS" and you can't provably argue "but you'll increase your engineering and operations costs by $2XMn over that same five years", then the folks with the concrete numbers will tend to win.
Yes, of course, I agree with you. It ususally just as you say. I'm just saying there are other environments that do not worry of the costs as much as some other factors, and (see next paragraph).
And some of the factors are not about support (or support costs) but differences in
- the apparent reaction speed difference from security patches,
- and the apparent disparity with bugs,
- the speed of availability of new releases,
- and when a new sub-release (like 6.5) comes out, no patches for centos for 6.4 are available anymore.
- We have some 3rd party software for important functions that can not go higher than something lower than 6.5, or we lose support (not RH here) with that important 3rd party software. If we want security patches for 6.4 or 6.3 etc, (and we do), we can not get them through centos. That's a factor that centos can not currently mitigate, but maybe over time perhpaps. Some environments do care about those factors that are not about strictly cost/support with Red Hat.
James made some good points for the unforseen as well (on a somewhat different note)
Hmm... Haven't found an issue with patches for prior releases disappearing - maybe I'm misreading your meaning? Could you clarify (as it might be worth bringing up when the next round of decision-making happens).
The solution we ended up implementing for CentOS leverages a set of full, in-house mirrors of the CentOS repos. Since we don't age-off the RPMs, all the back-rev RPMs are still available to us (in "vault" repos). The third-party sustainment solution (chosen because it supported Windows, Linux and Solaris) allows the creation of targeted update-bundles (kinda like channels but also more of a pain in the rear to implement - the joys of reinventing the wheel) using the in-house repo-mirrors.
Even if we needed older revs, some of the CentOS sites still make them available (you just have to dig through the mirrors to find the few sites that retain them). The actual space-cost of keeping the back-rev RPMs is fairly small (though larger than sites offering free public mirror service might want to take on).
Tom, I'm personally glad to hear that about patch speed. I've read some sources (yesterday) that said the patches were not always timely, but they did not provide solid timelines on details. As I dig further, it appears from this source *(link works at the moment, see point #14) the delay is probably trivial, 24-72 hours. The patches for minor releases, or older 'point releases' do not disappear, but stop with the next subsequent point release (source two paragraphs down).
The 'days delayed' for availability for a given major or minor-version of centos is here, with rhel 6 taking the longest.
Yes, their archives for older point-release versions such as 6.4, 6.3, 6.2, 5.9, 5.8 etc are certainly available, and that's great - but when (for instance) 6.5 hits the streets, all patches for rhel 6.4 end. The same identical bit also goes for RHEL, however RHEL offers a subscription that allows patching of the older version for just security related patches. Here is the Centos source for that at their website, quoted below:
begin quoted material
<!> The CentOS project does not offer any of the various approaches to extended life for an earlier point release which its upstream occasionally does for its subscribing clientèle. Once a new point release is issued (say: 6.3, following 6.2), no further source packages (from which updates can be built) are released for the earlier version and therefore CentOS is no longer able to produce security or other updates. After a transition interval of a few weeks, the old point version binaries are moved to the vault. There is a longer discussion at item 15 in the FAQ for more details.
end quoted material
Red Hat's source (this is it, scroll down to "details", then go to "Extended Update Support"), explains it.
begin quoted excerpt,
Errata advisories can be released individually on an as-needed basis or aggregated as a minor release on security errata being available for minor releases
end quoted excerpt from source above
(they say "minor releases" instead of "point releases).
Cool. I'll have to look these up and kick them over to the service-owner.
Actually, ended up just writing a "differences you need to be aware of between releases used in production" document for the Operations staff. Hopefully, they'll notice it and it will avoid some of the "why did 'X' not do what I'm used to/Google said it would" emails. =)
I'm still curious what your customer think about the delays in the downstream? I recently read this post-mortem that was traced back to delays in the CentOS process http://crunchtools.com/centos-post-mortem-analysis/, have you run into any of these issues with customers?
What about security related issues and the downstream delay?
Probably depends heavily on the type of shop you work in. In large, CM-oriented shops, there's often a significant delay between the release of a patch (or patch-set) and its certification for deployment to production. In larger, more conservative environments, that certification period frequently runs a minimum of 45 days and typically in the multi-month range (allowing the larger user community to trip over bugs and e-fixes to be released before blind-application to production). Many of those shops also do testing of their own prior to application. If bugs are uncovered, either a local fix is created (and possibly reported to the upstream source) or the patch is skipped altogether.
In reading the post-mortem article, it seems like the author didn't really do pre-deployment testing of his patches and got bit. I found the section:
The difference in bugs could be interpreted in any number of ways. It may be indicative of more RHEL users testing and finding SELinux bugs[2], it might indicate that RHEL has more SELinux bugs, or it might indicate that CentOS patches SELinux bugs faster than RHEL, but that is doubtful.
Somewhat interesting in the other likelihoods it didn't enumerate that:
- CentOS users may be either far less likely to run SEL than RHEL users or, if they are running SEL, may be less reliant on vendor-supplied policies (potentially being more likely to run custom policies)
- knowing it's a "community maintained" distribution, are less likely to file a bug-report.
I'd hazard that the second one is the most likely explanation (if you don't have a support contract, you're probably more likely to hit up forums for support than try to create a userid to file a bug).
Tom,
Agree on the point about bug submission, I also believe a lot of folks would consider a bug submission for CentOS a bit of a wash knowing how its built. Forum posts may be a better gauge.
On the timeline however, I think you're talking about the end user CM process not the vendor release process. Unless the shops I've worked in are vastly different, once we entered our CM phase for a patch set, it took a major issue to add newly released patches to the set.
While a CM end user may catch the bug earlier, they still wait just as long for the fix from CentOS as everyone else. While there are some minor issues with the methodology, the article hits on a point mentioned in this thread too, downstream repackaging timelines can extend exposure to bugs and security holes.
So - put most of this together and then discovered the link at the bottom. Yes, I am a "fanboy"... but I think it's justified.
Confidence:
There was a period when one of the Red Hat-esque derivatives (CentOS) went dark - the owner of the domain disappeared, blah, blah. Obviously that is no longer an issue for CentOS now that Red Hat has agreed to support Cen
tOS also. I believe one of the creators of Scientific Linux works at Red Hat. I would simply struggle buildin
g my environment on an unknown.
Issue Avoidance and Resolution:
One challenge I see in these discussions is the custmoer is not purchasing a "thing". It's like insurance, you don't pay for an accident, you pay to cover things if/when you do have an accident. Some people do not seem to realize you are paying for a bunch of people to test things BEFORE they become an issue, not deal with them AFTER they are an issue. Of course, certain things get missed or are not discovered - and this is definitely where Red Hat shines. I had a Red Hat Support Tech on the phone within 40 seconds last night at 1am - He was analyzing my sosreports within minutes after. The community support model is obviously not quite that accomodating ;-)
Interoperability:
I realize Satellite has been mentioned, but I think it is a very important point. Satellite is very well suited to manage a Red Hat environment and provides additional benefit over a simple Spacewalk deployment. Another
product which I was very happy to have supported by Red Hat who provided quick resolution to a problem that I h
ad created in my environment. I can't begin to imagine supporting Jboss/Tomcat without having my Satellite tak
e care of the deployment and maintenance.
Standards:
For the most part we try to stick to "the Red Hat way" of deploying and maintaining our systems. In hopes that
issues will be minimal and any required support will be quick and straight-forward.
Indemnification and Assurance (FOSS):
Red Hat has you covered and ensures that with their software and tools you will be compliant for Open Source.
Check this out:
http://www.redhat.com/why_red_hat/
Good points James, especially "Issue Avoidance and Resolution". There are on ocassion, unknowns where even the most remarkably brilliant seasoned person I knew ended up calling on support that helped ease a very unqiue and significant downtime issue. The folks on the team I am with deal with various vendors, some good, some not so great, but RH support is overall superior to the others we deal with.
What is everyone's take on this paragraph from this source
begin quoted material below
While CentOS is derived from the Red Hat Enterprise Linux codebase, CentOS and Red Hat Enterprise Linux are distinguished by divergent build environments, QA processes, and, in some editions, different kernels and other open source components. For this reason, the CentOS binaries are not the same as the Red Hat Enterprise Linux binaries.
Above marks the end of the quoted material
I'd be happier if the posted FAQ-excerpt was originally hosted on .centos.org. Not saying something misleading would be posted on a RedHat site, just that if it comes from the CentOS devs, it carries more authority. Even better if it was a cooperatively-authored FAQ.
That said, my overall stance would be that it would depend on the derivation-method. Using the assumption that they're creating their RPMs from the same SRPMs that RedHat is (i.e., no alteration beyond appropriately-setting their rpmbuild's distribution, vendor and packager macros) I'd be reasonably comfortable on compatibility-level. I'd also be happier if, when there were known differences, the CentOS release-notes detailed what the (non-branding) deltas were and why.
Any argument you make to a business to alter their decision-paths will come down to making an appropriate business-case to them. If they've got "real" numbers they've made their prior decisions on, you need to come up with more/better "real" numbers that convincingly throw the prior decisions into doubt.
As to "more current" distributions...
There's not a ton of (particularly higher-end) COTS vendors that certify outside of RHEL and/or SuSE. So, users of that kind of COTS aren't generally seeking "more current" distributions.
The most frequent drivers of wanting "more current" I've observed is when a customer wants to leverage protocol enhancements that aren't in the "enterprise" Linux distributions (e.g., newer Winbind/Samba for interacting with latest-and-greatest Windows hosts; newer versions of OpenSSH to more-fully support real PKI; etc.). In these cases, Linux (as a whole) is generally flexible enough to allow an informed customer to update RPMs/RPM-sets without having to throw out the whole distribution. ...Though, in some cases, the mods necessary void your vendor support, which makes arguments for RHEL over CentOS/SciLin less compelling.
While some customers entertain the notion of other distribution families, a big pause comes if those distributions aren't sufficiently similar to what their staff is trained on (hell, the sysvinit to upstart to systemd transition across RHEL causes enough heartache!). Depending on staffing, that can be a huge disincentive.
Even beyond training-costs running different distributions changes your sustainment requirements ($$).
Overall, ability to consider other options comes down to what the environment is like. This will create the business-cases for following or avoiding any given path.
Here's an anecdotal story that might be useful. I just worked with a customer to add a new host to the RHEV environment. Of course this should have been routine. After all, by now we've all done this a time or two.
Four, count 'em, four hours later - I couldn't get that host to find the customer storage to save my life. Turns out, the problem was a permission issue with Compellent iSCSI storage. We all forgot about Compellent's quirks. And the truth is, I'll probably forget about Compellent's quirks again in a few months as this incident fades in my memory.
A gentleman named Jon from Australia helped me dig out the answer. There's a nifty little script called /usr/bin/rescan_scsi_bus.sh and that pointed the way. With Jon's help, the lightbulb finally lit up in my head and we took care of it by fixing the Compellent permission issue.
The point of my little story is, as James said, that vendor support may be invaluable one day. As smart as we all are on our own, sometimes when the company is up against a deadline and lots of money rides on getting a system up and running right now, that support may be the difference between success and the post-mortem trying to figure out what went wrong.
You don't quantify it with acquisition cost, you quantify it with risk assessments. With CentOS, you're on your own. If you're running your company web server on CentOS and something happens, look in the mirror for your support team. And by now, there should be tons of TCO studies to draw upon.
- Greg
Just for clarity: running a bus rescan caused the permissions issue to show up in your system logs or was it that you weren't rescanning the bus at all?
Before the Compellent issue fades, it should be noted that, depending on your site's default build profiles, the rescan_scsi_bus.sh script may not be loaded. That script is part of the sg3_utils RPM. In our particular case, our systems' default builds do not include this RPM. Thus, we're forced to use the manual alternative (echo "- - -" > /sys/class/scsi_host/hostN/scan). It's actually part of our standard storage procedures for any dynamic addition of storage (generally not required if your SOPs include rebooting when storage is added).
running a bus rescan caused the permissions issue to show up in
your system logs or was it that you weren't rescanning the bus
at all?
Too funny - no, nothing that sophisticated. rescan_scsi_bus.sh is also part of RHEV-H 3.3. Which is good because you can't yum install anything on RHEV-H. This customer has a premium support subscription and the whole point of my post was, having that 24X7 support saved my rear-end. Again. If I had been completely on my own, I would probably still be trying to figure it out more than 8 hours later. Or asleep and exhausted in front of the keyboard with a splitting headache.
We also talked about doing echo "- - -" > /sys/class/scsi_host/hostN/scan but I never did understand what that was all about. And didn't need it.
OK, so how did a bus rescan lead to the Compellent permission issue? The RHEV data center for this customer has 6 iSCSI LUNs from two different vendors. When Jon directed me to run that script on my problem host, it found the hardware LUNs and also the iSCSI LUNs from a vendor name SNAP. That got my attention - I figured it would only reset real, hardware SCSI busses. But looking at its output, I noticed it also looked at iSCSI LUNs. That's one pretty slick script.
So per suggestion from Jon in Australia, I ran the script on a known good RHEV-H host and saw that it found all 6 iSCSI LUNs, from both SNAP and CMPELLNT. (I think that was the abbreviation.)
Why did my known good host find the Compellent LUNs and my problem host didn't find them? Good question. That triggered a vague memory from the last time we set up hosts at this site - the Compellent array is fast as lightning but its interface is quirky. You need to tell the Compellent about every new host that will use it and we never did that. That was the whole problem.
I have a beef with RHEV on this - when a RHEV host has a problem connecting to storage, it should tell me which storage. But that's a product problem and top quality support worked around it. That's what world class support is supposed to do.
The rescan script does equivalent actions to the echo ... trick. The script's especially nice in that it does all of the busses by default. With the echo ... trick, you have to do your own iteration.
In either case, what it's doing under the covers is initiation of a SCSI bus-walk (each possible node-ID get interrogated). Assuming a given device is responsive, it should report in once interrogated.
To digress a bit, one of the other fun things with iSCSI is it can be more delayed than "real" SCSI devices. This can sometimes cause inconsistent check-in behavior (though, generally not horribly more inconsistent than the results of an FC LIP request - especially if your storage VLAN is well-constructed). Because of these delays, it's generally best to refer to such storage devnodes via a persistent identifier rather than the more-generic, dynamically-mapped identifiers. Haven't played much with RHEV (ours is a VMware shop), so dunno what its default reference-method is. More a "this can bite you in some contexts" note rather than RHEV-specific.
Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.
