Chapter 10. Cloning Subsystems

When a new subsystem instance is first configured, the Red Hat Certificate System allows subsystems to be cloned, or duplicated, for high availability of the Certificate System. The cloned instances run on different machines to avoid a single point of failure and their databases are synchronized through replication.

10.1. About Cloning

Planning for high availability reduces unplanned outages and other problems by making one or more subsystem clones available. When a host machine goes down, the cloned subsystems can handle requests and perform services, taking over from the master (original) subsystem seamlessly and keeping uninterrupted service.
Using cloned subsystems also allows systems to be taken offline for repair, troubleshooting, or other administrative tasks without interrupting the services of the overall PKI system.

NOTE

All of the subsystems except the TPS and RA can be cloned.
Cloning is one method of providing scalability to the PKI by assigning the same task, such as handling certificate requests, to separate instances on different machines. The internal databases for the master and its clones are replicated between each other, so the information about certificate requests or archived keys on one subsystem is available on all the others.
Typically, master and cloned instances are installed on different machines, and those machines are placed behind a load balancer. The load balancer accepts HTTP and HTTPS requests made to the Certificate System subsystems and directs those requests appropriately between the master and cloned instances. In the event that one machine fails, the load balancer transparently redirects all requests to the machine that is still running until the other machine is brought back online.
Cloning Example

Figure 10.1. Cloning Example


The load balancer in front of a Certificate System subsystem is what provides the actual failover support in a high availability system. A load balancer can also provide the following advantages as part of a Certificate System subsystem:
  • DNS round-robin, a feature for managing network congestion that distributes load across several different servers.
  • Sticky SSL, which makes it possible for a user returning to the system to be routed the same host used previously.

10.1.1. Cloning for CAs

Cloned instances have the exact same private keys as the master, so their certificates are identical. For CAs, that means that the CA signing certificates are identical for the original master CA and its cloned CAs. From the perspectives of clients, these look like a single CA.
Every CA, both cloned and master, can issue certificates and process revocation requests.
The main issue with managing cloned CAs is how to assign serial numbers to the certificates they issue. Different CAs can have different levels of traffic, using serial numbers at different rates, and it is imperative that no CA issue the certificates with the same serial number. These serial number ranges are assigned and managed dynamically by using a shared, replicated entry that defines the ranges for each CA and the next available range to reassign when one CA range runs low.
The serial number ranges with cloned CAs are fluid. All cloned CAs share a common configuration entry which defines the next available range. When one CA starts running low on available numbers, it checks this configuration entry and claims the next range. The entry is automatically updated, so that the next CA gets a new range.
The ranges are defined in begin*Number and end*Number attributes, with separate ranges defined for requests and certificate serial numbers. For example:
	
 dbs.beginRequestNumber=1
 dbs.beginSerialNumber=1
 dbs.enableSerialManagement=true
 dbs.endRequestNumber=9980000
 dbs.endSerialNumber=ffe0000
 dbs.replicaCloneTransferNumber=5
Cloned CAs do have limits on what operations they can perform. Most important, cloned CAs cannot generate or publish CRLs. Any CRL requests submitted to a cloned CA are immediately redirected to the master CA. Anything related to generating or caching CRLs is disabled in the CS.cfg file for the clone. Clones can revoke, display, import, and download CRLs previously generated by master CAs, but having them generate new CRLs may cause synchronization problems. Only a single CA should generate CRLs, and this task is always left to the master CA, which also maintains the CRL cache.
Master CAs also manage the relationships and information sharing between the cloned CAs by monitoring replication changes to the internal databases.

TIP

If a CA which is a security domain master is cloned, then that cloned CA is also a security domain master. In that case, both the original CA and its clone share the same security domain configuration.

10.1.2. Cloning for DRMs

With DRMs, all keys archived in one DRM are replicated to the internal databases of the other DRMs. This allows a key recovery to be initiated on any clone DRM, regardless of which DRM the key was originally archived on.
After a key recovery is processed, the record of the recovery is stored in the internal database of all of the cloned DRMs.
Although key recovery can be initiated on any clone, once the recovery is initiated, it must be completed on the same single DRM. This is because a recovery operation is recorded in the replicated database only after the appropriate number of approvals have been obtained from the DRM agents. Until then, the DRM on which the recovery is initiated is the only one which knows about the recovery operation.

10.1.3. Cloning for Other Subsystems

There is no real operational difference between masters and clones for TKSs; the information created or maintained on one is replicated along the other servers.
For OCSPs, only the master OCSP receives CRL updates, and then the published CRLs are replicated to the clones.

10.1.4. Cloning and Key Stores

Cloning a subsystem creates two server processes performing the same functions: another new instance of the subsystem is created and configured to use the same keys and certificates to perform its operations. Depending on where the keys are stored for the master clone, the method for the clone to access the keys is very different.
If the keys and certificates are stored in the internal software token, then the keys must be exported from the master subsystem when it is first configured. When configuring the master instance, there is an option in the Export Keys and Certificates panel to back up the keys and certificates to a PKCS#12 file. Before the clone instance is configured, the PKCS#12 file is copied to the alias/ directory for the clone instance. Then, the PKCS#12 filename is given in the Restore Keys and Certificates screen during the clone's configuration.
If the keys and certificates are stored on a hardware token, then the keys and certificates can be copied or referenced directly in the token:
  • Duplicate all the required keys and certificates, except the SSL server key and certificate to the clone instance. Keep the nicknames for those certificates the same. Additionally, copy all the necessary trusted root certificates from the master instance to the clone instance, such as chains or cross-pair certificates.
  • If the token is network-based, then the keys and certificates simply need to be available to the token; the keys and certificates do not need to be copied.
  • When using a network-based hardware token, make sure the high-availability feature is enabled on the hardware token to avoid single point of failure.

10.1.5. LDAP and Port Considerations

As mentioned in Section 10.1, “About Cloning”, part of the behavior of cloning is to replication information between the master and the clone, so that they work from an identical set of data and records. This means that the LDAP servers for the master and clones need to be able to communicate.
If the Directory Server instances are on different hosts, then make sure that there is appropriate firewall access to allow the Directory Server instances to connect with each other.

NOTE

Cloned subsystems and their masters must use separate LDAP servers.
A subsystem can connect to its internal database using either SSL over an LDAPS port or over a standard connection over an LDAP port. When a subsystem is cloned, the clone instance uses the same connection method (SSL or standard) as its master (subsystem => database). With cloning, there is an additional database connection though: the master Directory Server database to the clone Directory Server database. For that connection, there are three connection options:
  • If the master uses SSL to connect to its database, then the clone uses SSL, and the master/clone Directory Server databases use SSL connections for replication.
  • If the master uses a standard connection to its database, then the clone must use a standard connection, and the Directory Server databases can use unencrypted connections for replication.
  • If the master uses a standard connection to its database, then the clone must use a standard connection, but there is an option to use Start TLS for the master/clone Directory Server databases for replication. Start TLS opens a secure connection over a standard port.

    NOTE

    To use Start TLS, the Directory Server must still be configured to accept SSL connections, so it must have a server certificate and a CA certificate installed on the Directory Server and SSL must be enabled.
Whatever connection method (secure or standard) used by the master must be used by the clone and must be properly configured for the Directory Server databases.

IMPORTANT

Even if the clone connects to the master over a secure connection, the standard LDAP port (389 by default) must still be open and enabled on the LDAP server while cloning is configured.
For secure environments, the standard LDAP port can be disabled on the master's Directory Server instance once the clone is configured.

10.1.6. Replica ID Numbers

Cloning is based on setting up a replication agreement between the Directory Server for the master instance and the Directory Server for the cloned instance.
Servers involved together with replication are in the same replication topology. Every time a subsystem instance is cloned, it is added to the overall topology. Directory Server discerns between different servers in the topology based on their replica ID number. This replica ID must be unique among all of the servers in the topology.
As with the serial number ranges used for requests and certificates (covered in Section 10.1.1, “Cloning for CAs”), every subsystem is assigned a range of allowed replica IDs. When the subsystem is cloned, it assigns one of the replica IDs from its range to the new clone instance.
dbs.beginReplicaNumber=1
dbs.endReplicaNumber=95
The replica ID range can be refreshed with new numbers if an instance begins to exhaust its current range.