Pets versus cattle - help me grasp this

Latest response

I've looked at the presentations but I don't get it. Pets are virtual machines that have a long life, typically measured in years. They're like good old fashioned physical machines except they're virtual. The file server is a pet. The database server is a pet. Each pet has individual characteristics and is unique.

Cattle are virtual machines with a lifetime of maybe a few minutes, maybe a few hours. Cattle are not unique and we create and destroy them with APIs. So we have an application running somewhere that creates virtual machines, orchestrates some kind of workload, then destroys them when they're no longer needed. If I need to, say, compile 1000 different program modules, I can create 1000 virtual machines each compiling a module. If I have enough raw capacity somewhere, I can do that job 1000 times faster than trying it with one bare metal box.

So that's the idea. I get this much.

So then I see presentations that say the traditional workload based on the pet approach will go away over time, to be replaced with workloads based on the cattle approach. It's the future. It's what cloud is really all about. Cloud isn't about where the VMs are hosted, it's about using cattle to run the workload instead of pets. With the cattle approach, I can quickly scale up and down (what's the word for "stretchy"?) and I can deploy my cattle in any number of hosting scenarios - local or with different hosting providers.

OK. So drilling down - how would you run, say, an ERP workload using the cattle approach? You have one database with all your part numbers and customer info and procedures, and a bunch of threads pound away on that database. How do lots of cattle that dynamically appear and disappear do it better than a few pets? Some how, some way, you still have to deal with business rules, you still have all the other computer science issues going on. What advantage does the cattle approach offer?

Let's say you need to "stretch" (I still don't remember the official word) your capacity so you create a bunch of cattle VMs out in, say, AWS or Rack Space or somewhere like that. But the database lives here at corporate. Is there some telecom magic I missed that will give this remote VM the same access to the database as the local VMs? Ah - but wait - put your database inside a Gluster file system and then it will replicate all over the place. Well, OK, maybe. Except Gluster doesn't work well with databases. And you still have a bunch of telecom issues to solve, but now they're with replication. And I don't have a clue how you would deal with stuff like database locking. What if one of the cattle at corporate wants to touch the same record as one of the cattle out at AWS? They're both touching their own copies - how do you keep them from stomping on each other when they replicate back and forth?

So the idea is, with the cattle approach, you don't have to buy all that processing capacity to service peak loads, you grow and shrink it as needed. But by the time you buy all the telecom capacity to copy everything all over the place, and come up with some kind of scheme to keep all the locks straight, how much of a win is it really? Seems to me, you're still buying enough capacity to service a peak workload, just in a different form.

Or maybe I'm missing something. Help me out.

  • Greg

Responses

Hi Greg,

The database itself is not an animal in this parable.
It is the grass, business rules grow and we cut the grass when business tell us to changes the rules.

A milk cow, a pet that you want to keep as long as possible (traditional big server) keeps eating a lot.

Livestock which will be genetic adepted to tomorrows needs will be killed when it's fat enough to eat.

So it does not matter what animal eats the grass, execpt when all is eaten "Storage crash"

Does this clearify some issues of the parable?

Kind regards,

Jan Gerrit

Hi Jan -

No, it doesn't clarify it for me. Keeping with the metaphor, the prediction is that the pet style of workload will gradually go away in favor of the cattle style of workload. Pets will be obsolete in a few years.

But with enterprise kinds of workloads, where everything revolves around a common database - where all the animals all eat the same grass - what does it matter whether pets eat the grass or cattle eat the grass? How is creating/destroying a bunch of temporary VMs better than having more permanent VMs?

The word for "stretchy" finally came into my head overnight. It's "elastic." The idea driving the cattle approach is, your infrastructure can be elastic so it grows and shrinks as demand dictates. Rent capacity to handle the peak periods so you save lots of capital cost.

So stretching the farm animal metaphor to the breaking point - if the cattle in the cow pen are all feeding on the grass and the cow pen is full, you grow new cattle in a different cow pen somewhere else. But here is where the metaphor breaks. Those new cattle in the new cow pen still need to eat the grass in the original cow pen. How do you handle that? I haven't seen an answer to this question yet.

And what about the app that creates/destroys all these cattle - where does that app live and how do you make it resilient? Seems like that app that creates all the cattle is itself a pet. So you haven't gotten rid of pets, just moved them.

Now dropping the metaphor - how would this look? Alice is an end user of of the company ERP system. Alice wants to schedule a production run for the assembly line to build a bunch more widgets. So she taps some buttons in her tablet, the tablet uses the company wifi to fire up one or more VMs to access the database and perform the required steps to schedule the production run. The VMs fire up, do their thing, then disappear.

How is that better than the way we do things now? Well, Alice is on a tablet so she's portable, but we don't need to re-architect the whole infrastructure for that.

The VMs on the back end are single use software appliances. Maybe it's easier to develop the ERP system this way because the VMs fire up, do a job, then disappear. So not as many hassles worrying about things like memory leaks. But creating and destroying those VMs will carry a bunch of overhead. And what's different or better about creating/destroying a bunch of VMs versus creating/destroying a bunch of process threads inside a single VM?

So I still don't get it.

  • Greg

Staying off the metaphor.. I'm going to come out and say that the 'cattle' approach doesn't suit all workloads, and anyone that attempts to convince you that it does probably has a vested (financial) interest in you doing so. Sure you could probably wedge it into most situations, but the reality is it doesn't make sense for everything.

To bring it back to the farm:
You could plough a field with a golf cart, but you're better off using a tractor. Use what makes sense.

Greg, the 3rd paragraph might shed some light, no promises, however it is a presentation on how CERN is evolving their data center along these lines to possibly allow their internal customers to use openstack (especially slides 16-18) via administrators for their 'cattle' computing needs.

According to this article, the whole pets/cattle bit started with Randy Bias of CloudScaling used the metaphor in this presentation (slide #19) in which he attributes former Microsoft employee Bill Baker (look -just below that slide at the closed captioning text-) as the source.

In Cern's slides, they wrap this bit around the topic of OpenStack (slides 16-18-end, slide 15 says they want to 'move to the cloud' (which seems to likely be more openstack)). They are going to have a heavy does of Puppet, Foreman.

The 'cattle' approach doesn't suit my current customers... I like the metaphor Pixel used with the golf cart/tractor.

Well it's a good metaphor but I like Pixel's comment above. I don't see how the cattle approach makes sense for I/O constrained workloads centered around a database, which includes just about all workloads in the real world.

But some of the top people at Red Hat say the world is trending this way. And this concept of elasticity is a worthy goal. But I don't know how to make it fit anything usable.

Ya know - with the Red Hat Summit coming up, maybe it would be possible to organize some kind of a session around this topic. I'd want to hear how the experts answer the question.

(edited to fix some clumsy sentences)

Greg,

My last sentence in the previous post, I said I liked the metaphor Pixel used, I did not mean the cattle metaphor. I agree with you that some of the details on the cattle approach are not clear in the way you describe. I think you raise good points.

I'm not advocating the cattle bit, and doubt my customers would want it.

I merely posted the above in a hope to address one source (CERN) that seems to like that pet/cattle bit enough to put it in their discussions (I do not work for CERN).

After reading the two presentations I mentioned - I'd be interested in the answers you speak of.

"Just about all the workloads in the real world" is an interesting take. Sounds like something Oracle would say. :)

I would myself rephrase that as "all the workloads which have - for better or for worse - been architected as highly synchronous, stateful, I/O bound, ACID centric." Many examples, such as ERPs, Trade Settlement, exist which have by necessity been architected as such. Others, such as Salesforce, have illustrated how CRMs need not be architected so.

The world is trending this way, and it fits many usable application models, from Big Data analysis to eCommerce. It doesn't fit all usable application models, yet neither does traditional virtualisation nor bare metal. You can run your own email service, microblogging service, etc. in you home lab if you can and want to do so, but that doesn't make that model generally useful, when most users just want email or microblogging, and have better uses for their time than to administer their own IMAP/SMTP servers, etc.

The cattle metaphor won't fit all workloads. But not all workloads really need to fit the pets metaphor. One needs to throw out all assumptions and see what works for which models or metaphors. That is the truly hard, yet valuable work.

Hi Alex, nice bit you wrote there. Love the 'for better or worse' in your second paragraph.

While the intelligent thing is to come up with an appropriate design to optimally-fit a specific workload's IT characteristics, that's not always (rarely?) how it works.

Bringing up Oracle is a great example: Oracle loves to sell RAC whether or not your data consumers benefit from a parallel database solution or not. But, Oracle makes a LOT more money on licensing and consulting with RAC than they do with failover and especially more than they do with single instance (especially single instance that gets its HA/DR via a hosting substrate like VMware).

Then again, this all comes back to what I tell other engineers that are lamenting a technically-superior proposal being rejected for one less optimally-designed: IT is frequently less about technology than pyschology. Thus, the use of metaphors - whether they're clear or apt ...or whether they're really reflective of wider-trends.

Greg,
Sorry, I don't have technical background on that cattle approach, and certainly didn't answer your quesetions. I posted the slides earlier hoping it might help identify the source and that CERN uses it, but recognize you were looking for more of a why someone would use that with specifics on I/O etc.

On a separate note, we're looking into Hadoop.

I've been accused of lots of things - but being an Oracle fan? Oh man, that hurts! Remmele, thanks for posting the side pointer - I saw some slides earlier from some top folks at Red Hat too. I think there's another discussion here with them.

Quick story - I remember back in the early 80s when these toy personal computers hit the market. Nobody would buy those things, or so I thought. Well that was a mistake. So now I try to pay attention when new ideas come around, even if I don't understand them. Which is most of the time.

Ok, so following up with Alex - let's say you were going to build an ERP system using the cattle approach. How would you do it? All the systems right now have a massive database with connections and links all over the place and different programs that touch different parts of one large database. So everything needs to be "close" to the database. But with the cattle approach, you'd have to somehow break all that up so the modules are as independent as you can make them. So you don't have a database anymore, you have a bunch of separate systems and you clone copies of pieces of it when they get busy. I wonder if any app vendors are thinking along these lines?

All,

Oracle being a Database and Application Cloud Reseller beside RAC, makes them the "perfect" farmers: Leave that type of cattle on pasture graze what you need.

So take the approach that suites the needes, do not go for an approach that only sounds nice.

Kind regards,

Jan Gerrit

Hey Greg,

Here's a key distinction that I've been trying to make when talking about cloud vs virtual resources, or cloud native versus cloud hosted. It's not about how you run a workload but how you build a service.

So to your last post, you are getting round to the right POV for architecting a cloud native application. Is there "ERP in a box" built for cloud, maybe not. Is there ERP SaaS built cloud native, absolutely.

In your original ERP example, you've got an impedance mismatch. The ERP system is designed for a virtual (or bare metal and shoe horned) environment, so it won't function the "right way" from a cloud native perspective. It wants to scale up with more RAM and more disk on single systems. The ERP system you need to look at to differentiate "pets vs cattle" is one you are designing from the ground up.

The idea of a cloud native (cattle) ERP system is one that wouldn't have a single database but a series of sharded datastores that are appropriately resilient, it would have a group of BRMS engines that could scale up or down as more queries came up, a gaggle of UI engines, etc. All of these would make use of the elastic qualities of cloud infrastructure but to the end user, it's a single seamless application that may not even look different than the current ERP. It's just built with different patterns and building blocks.

The design pattern of "single massive authoritative database" == enterprise breaks at an architecture level (IMO, needs to be scrapped in general but that's a different rant) when looking at cloud native application design. Alex has a good point when he characterized the workload. Cloud wasn't designed for high I/O, low latency, ACID compliance. Cloud was designed for highly parallel, loosely coupled, BASE compliance with lots of survivability built into the application logic.

HTH,
Matt

What I dislike about the whole 'cattle' pitch is that there is an inference that the 'pets way' or historical hosting of software applications is somehow broken and needs to be fixed rather than the 'cattle' method being put forward as a complimentary alternative. The RHEL servers I work with today are vastly different to the RHEL servers I was building even as recently as 5 years ago and this is all without the 'cattle' ideology of spinning instances up on demand and elastic compute. Virtualisation provides me the capability to power down virtual instances when they aren't being used (without destroying/rebuilding them), so they are still 'pets'. I can thin provision backend storage, I can over subscribe memory and compute to increase hardware utilisation.. all without destroying any ‘pets’.

I honestly find that production workloads are very rarely constrained by CPU these days, and IO is the killer. Gone are the days where my only option to scale was up and into the cost prohibitive midrange space (eg. Sparc / P Series midrange). I am working in environments now that are 95% virtualised, take up about 10% the footprint they did historically, have massive disk IO thanks to improved SAN technology and have resources to burn... all of this has nothing to do with 'cloud' or 'elastic compute' and entirely to do with logical progression in the old and mouldy 'pets' server space.

Another argument is that hardware is so cheap now I would have trouble justifying a 'scale out’ to an externally hosted cloud. I feel x86 has destroyed the pricing in the server market, and x86 hardware is so cut throat now that the compute you get for your dollar is ridiculously competittive. Not only that, due to the lack of demand on capacity I have had customers even extend their server maintenance agreements by multiple years because the demand to upgrade just isn't there (which reduces the cost of purchasing in house hardware). I recently commissioned an x86 server with 1TB of RAM. If a customer can buy that off the shelf and the customer has to spend $0 on re-architecting an ERP solution, I am not sure why I would recommend to them that cutting their workload up into bite size chunks and provisioning on demand is the better approach for them?

Lastly, I am fascinated how anyone maintains security posture in these environments. Having maintained a ‘PaaS’ service for a software development environment I am amazed anyone can run a traditional production environment if things like backup, security audits, patching and data retention legislation are considered.

Not wanting to sound like an old Unix beard, but I agree with Greg in regards to ‘real world’ workloads that I have been responsible for not fitting this paradigm particularly well and example after example that show benefit of this method usually includes rapid devlopment/test, low IO high CPU workloads or web based massively parallel, short lived workloads.

The "real world" workload is a chicken/egg problem. None of the workloads I've ever supported would work in a cloud, but all of them could. There are many services that, had cloud been available, would have done much better in a elastic design model than the traditional x86 or RISC scale-up architectures we had.

Part of this is us changing our mindsets. DBA's asking for 1TB of RAM for a single database instance, sysadmins relying on HA clusters to reboot servers for 'health' reasons, developers stuffing MBs of data into sticky sessions that don't load balance well.

That also applies to things like backup, security, patching, etc. You don't backup and patch, you use an A / B stack method and replace your whole infrastructure, for example. Data is retained centrally and servers are dumb diskless templates.

There is no better or worse, there is a toolset and an adaptation that has to be made to make the best use of it.

Part of this is us changing our mindsets. DBA's asking for 1TB of RAM for a single database instance, sysadmins relying on HA clusters to reboot servers for 'health' reasons, developers stuffing MBs of data into sticky sessions that don't load balance well.

I don't see how switching to the 'cattle' paradigm resolves this. DBA's asking for resources they don't need isn't a product of a 'pets' environment, it's a product of a DBA not adequately understanding their environment. In a 'cattle' environment this problem would manifest itself in a different way, requesting many small instances rather than several large ones.

Rebooting for 'health reasons' would be the same in a 'cattle' environment in that they would just spin up new instances and destroy the old, nothing has been solved. Same goes for sticky sessions, this is a product of a DBA not the environment.. modifying your deployment method doesn't resolve this. (other than forcing a behaviour as there is no alternative)

Centralised data management was a product of SAN in the server room and and dumb diskless templates was a product of virtualisation at the edge, I don't see this as a particularly 'cattle' specific concept. Storing data on local instances isn't something I see done much, if at all anymore.. the world has moved on.

Most of this points to me being a 'cattle' farmer.. but none of the instances I manage are 'short lived' which is where my issue is because I don't see value in destroying it. Yes it can be deployed rapidly from a standard template. It has a generic configuration. Storage and core services are all managed centrally. It can be spun up on any number of physical servers which also provide load balancing and redundancy at the hypervisor layer.. so it it 'cattle'?

I personally don't like the metaphor as I've not met many farmers who don't care about their cattle. But that's just semantics.

This is just another technology shift in computers. We went from mainframes to mini's to servers, from dumb terminals to thick clients with various flirtations with thin clients.

The idea behind the cattle is of configuring simple, clean, reliable systems or templates that can be spun up and added to clusters to boost performance. Systems that I don't need to manually deploy. That I don't need to manually configure. That I don't even need to really know that extra systems were added to my web and app clusters overnight due to abnormally high customer visits. I end up not monitoring each node in the cluster, but the cluster itself.

This does all rely on fully automated deployment capabilities and config management. It takes the manual part of admin out of the equation (or drastically reduces it at least).

More effort is required at the front of the problem. The design phase needs more consideration to get right. But this pans out (when you've done your work right) to a substantial reduction in manual admin costs during the lifespan of the project.

/var/log filling up on nodeX? Deploy a new node to the cluster and delete nodeX. (your log files are centralised right?)

Just an example, but spans many other situations.

Your big ERP system may not be engineered to sit well on such a scenario, but that's less your issue than an issue for the people who wrote the ERP system.

I've come across many systems where this elastic scalability perfectly suits the unknown nature of customer loads. You will still have to be smart to mitigate against the overwhelming load of thousands of tweeps following a celebs link to your site, but more gentle (seasonal?) load variations are very well matched to the cattle scenario.

I just think the metaphor could be better - farmers don't discard sick cattle. Not where I live anyway.

Edit: I have no axe to grind either way. I don't win or lose regardless of how many pets/cattle there are out there. "Cattle" suit many of my business needs. Pets suit the less scalable off-the-shelf enterprise apps (sometimes from companies who should really know better). Example? A large unnamed company's eBusiness Suite - web servers + app servers + db backend makes for a compelling case for cattle. Unless, of course, your trying to deploy said eBusiness Suite in which case the consultants demand the very biggest, most unscalable servers to run each part of the application. It's almost like they're trying to sell you more licenses because you put more cores & RAM in the servers?

The idea behind the cattle is of configuring simple, clean, reliable systems or templates that can be spun up and added to clusters to boost performance. Systems that I don't need to manually deploy. That I don't need to manually configure.

I don't see this approach as any different to how 'pets' are deployed. Just because I plan to keep the server long term doesn't mean that I neglect to automate the creation, configuration management and standardisation of systems.

I don't deny that the technology has its place in large scale solutions (eg. Google, Twitter etc.), I couldn't deny it.. the proof is there running these sites day in day out. I do however believe that there is a crossover point where switching paradigm starts to provide benefit (financial, cost, human resources etc.). My personal belief that is a large number of sites that are miles from the crossover point are planning to do it anyway. To make a site behave reliably and completely autonomously like you describe takes a large amount of work upfront. Luckily the amount of work required is slowly reducing over time as tools improve, but I still think many underestimate exactly what is involved.

I also think there is a lot to be said for the amount of traffic a 'traditional' webserver configuration can handle (basic clustering + backend DB clustering) and the need to architect for these peak loads just doesn't exist in many sites... but to many it feels technically empowering to do something the same way Google does it.

I have worked closely with developers that have the solid argument that if you architect your application for elasticity from the start you will have the flexibility to expand if required (ie. there will be 0 re-design costs), I completely agree with this approach for new development. I guess my personal gripe is the suggestion that business should re-architect existing solutions to fit the 'cattle' paradigm just because it's where the market is heading.

Although I may seem 'anti' cattle, I am also playing devils advocate and I am always looking for opportunities to solve difficult problems in new/unique ways so I am interested to know (if you are willing to share the info :D ):

  • How/when you are using the cattle approach in your environment and what type of servers/services you are providing.
  • Do any of the services have high IO demands?
  • What percentage of your environment do you have configured as 'cattle'?
  • How much (if any) of you workload is hosted externally with cloud providers? (is this ondemand, or a constant?)
  • Do you have traditional systems that you have moved over?, or are they all systems architected with the platform in mind?

Cheers!

For a start, we have a wide range of systems to contend with - from mainframe through to cloud. Generally speaking, the older the system, the less "cloud" it is (regardless of whether that's external or private cloud).

Secondly, much of this is theory - we're not at any production readiness here, but I know several others that are already running live cattle ranches.

The types of service that suit cattle best are those that are ripe for scaling - web clusters, app clusters and even some databases (elasticsearch type storage). Generally it's the n-tier designs where there is practically no config on the system specific to it's function. The same basic (secure) RHEL build goes on every system, Puppet then uses facter to decide what goes on next - httpd, jboss, elasticsearch. Once configured, the new node joins the relevant cluster.

So far this is in development and proof of concept stuff with me. But as I say, many other sites run cattle ranches - and not on anything like the scale of Google or Twitter. Stripping down the OS to a container that runs your application allows for the flexibility. Pets generally have tailored configurations, perhaps bonded NIC's, often Static IP (yes, still seeing too much Static IP configs these days), manually built routing tables etc etc. I'm not saying that's how all pet systems are, but they tend to be more like that than the stripped down cattle approach. If your OS is already streamlined and uniform across all systems, you're closer to cattle than you expect.

Our PoC systems are intended to migrate 'traditional' hand built platforms onto. The old systems don't scale well (or at all?), routing tables are complex and different across the estate. The PoC demonstrates scalability at the web, app and database layer. Incumbent apps tend to cause the biggest issue. DBA's reluctant to consider anything but Oracle. Oracle being out of the question due to cluster licensing fees and poor automated installation.

Sorry to be vague, but that's the most detail I can post without getting internet rights revoked :-)

Vague is fine, I understand the sensitivities.

The same basic (secure) RHEL build goes on every system, Puppet then uses facter to decide what goes on next - httpd, jboss, elasticsearch. Once configured, the new node joins the relevant cluster.

This is how I would expect most admins to build their systems, and I am surprised if people are still building bespoke systems by hand, and even more surprised if it's onto bare metal!

Prepare to be surprised. The number of Admins out there who genuinely understand the need for a compact, secure, automated build is shockingly low. There still tends to be a desire to get hands dirty and tinker with a build.

Still have the occasional 'discussion' to ask why the install DVD was used to build a machine in interactive mode and then hand registered to the Satellite. I don't think it's all down to a lack of understanding either. Sometimes people think they just know best. Sometimes there's a potential that they're protecting their jobs.

I try to demand system rebuilds as a part of acceptance into service. To demonstrate no hand configured items and also to show that new systems can be scaled/replaced without having to think too hard.

Close

Welcome! Check out the Getting Started with Red Hat page for quick tours and guides for common tasks.