Diving into disaster recovery with Red Hat Enterprise Linux

Updated -

A “disaster” can be a lot of things, depending on your organization, your culture, and the needs of your users and teams. And “recovery” may look different to different organizations, too. The first part of defining a disaster recovery plan, then, begins with setting your expectations.

  • What kinds of events or situations constitute a “disaster”? Is it a system failure or overload, certain service outages, or something else?
  • What do you need to do to recover? Do you want to define failover or is it reducing downtime for primary systems?
  • What event or person can kick off a disaster recovery process?

One way to define disaster recovery is to look at it from the perspective of business continuity - what can interrupt your critical business operations and what do you need to do to mitigate the impact or return service?

Once you know the situations around disaster and recovery that are relevant for your organization, then you can start defining your disaster recovery plan. There are some common points for most planning activities:

  • Identify your key infrastructure, services, and data.
  • Identify tools for specific key applications or services that can be used to manage those applications (for example, backup and restore tools for xen, NFS, or Identity Management in Red Hat Enterprise Linux).
  • Define backup schedules, storage requirements, and restore processes.
  • Determine whether you need hot or cold failover, clustering, or other architectural patterns for different key services or assets.
  • Use monitoring and management to identify failures and other performance issues.

Red Hat Enterprise Linux includes some general systems management tools to help with the backup/restore process, primarily the rear tool (which stands for relax and recover) to manage system configuration. It also includes tools like kdump to help troubleshoot critical failures. And Red Hat Insights can be used to monitor system performance, manage configuration and drift, and help deploy systems.

These resources are a good way to start the discussion around disaster recovery, at both the business continuity level and technical level:

Comments