Warning message

Log in to add comments.

Disaster Recovery

alcollin@redhat.com published on 2016-10-14T03:25:04+00:00, last updated 2016-10-17T16:59:18+00:00

Stability is one of the most important topics in IT. Although a system might have “five 9s” availability (up for 99.999% of time), there is still a chance of a disaster occurring. And when disaster strikes, the most important action for an IT team is to perform proper RCA (Root Cause Analysis). Luckily Red Hat Enterprise Linux created a feature to help with failed systems.

Enter kdump

kdump is a feature of the linux kernel used to assist with crashed systems. kdump works by booting another linux kernel while the main kernel is either hung, crashed, or otherwise inoperable. The second kernel that is booted dumps the main memory into a file called a vmcore that can later be recovered and used for RCA.

Red Hat Insights

The Red Hat Insights team knows that RHEL customers care deeply about properly generating a vmcore at the time of failure. The worst part of a crash is knowing another one is right around the corner, with more downtime and lost revenue. We have tracked a handful of statistics around kdump to better understand its adoption and use among our customers. One notable statistic is the percent of unique systems since 2009 that have enabled kdump.

Prescriptive Diagnostics

kdump is a complex program that depends on many pieces of a system to function correctly. This gives kdump a significant margin for errors, bugs, and other issues that can cause kdump to fail. Red Hat Insights has created many rules around kdump to facilitate the proper generation of vmcores.

We have back tested a subset of our kdump rules against historical data to review how many systems would be unable to properly generate a vmcore at the point of failure.

Surprisingly, in previous months we have seen as many as one fifth of all systems be unable to properly generate a vmcore. Luckily the percentage of problematic systems decreases over time, but we still expect percentages to hover around 5% (remember, this was tested against only a subset of Red Hat Insights kdump rules, and thus will be an underestimation).

Risky Business

All systems running and testing production programs should have kdump properly configured. Not doing this places unnecessary risk on any company. Red Hat Insights engineers continually review support tickets, historical data, and previous support solutions to better identify what prevents, or could prevent, kdump from generating a vmcore. This gives our customers peace of mind that they will have a vmcore properly generated at the time of failure.

Get started now with Red Hat Insights here.

English

About The Author

alcollin@redhat.com's picture Red Hat

alcollin@redhat.com

Alex is a software engineer specializing in data science for Red Hat Insights. He holds a degree is computer engineering from The Pennsylvania State University, and has had previous engineering roles at IBM and US Airways. Alex currently resides in Raleigh, NC.