NBDE (Network-Bound Disk Encryption) Technology
Table of Contents
NBDE introduction
Network-Bound Disk Encryption (NBDE) is tightly coupled with disk encryption. Why should a company or an individual encrypt disks? If you do not want to see your corporate or private data leaked, you should encrypt disks containing confidential data as an additional security measure.
There are two use cases for disk encryption:
- When a hardware device (unsecured servers, laptops, etc.) is stolen or lost, disk encryption mitigates the risks of data leaks. This is not normally a threat to enterprise-grade data centers with physical security.
- The second use case is relevant for enterprise-grade data centers: Sooner or later, disks will need to be replaced, because of a defect, or because it gets outdated from a technology perspective. Also, replacements to get extended storage are common in organizations. A data leak is possible at the end of the disk's life cycle, as replaced disks can not be wiped at all, and with the required knowledge, there is a chance to access data, at least partially. Wiping current disks takes a lot of time (normally for overwriting them with zeros, not even with random data). An encrypted disk can just simply get discarded without requiring it to be wiped or physically destroyed.
The standard for disk encryption in Linux is LUKS (Linux Unified Key Setup). Normal usage of a LUKS-encrypted device implies typing a password for disk decryption. For this reason, LUKS does not scale, because the passphrase must be entered manually on system startup, which is a no-go for data centers.
NBDE enables automated disk unlocking on system startup, preventing manual intervention to enter a password. This allows encrypting volumes of hard drives on physical and virtual machines without having to manually enter a password when restarting machines. This makes NBDE a perfect technology to add scaling to LUKS, above all, for organizations that use encrypted disks on different kinds of devices and want to automate the system boot process without manual intervention. This is a critical requirement in large environments, such as data centers.
NBDE technology
NBDE binds the encryption key to an external server or set of servers securely and anonymously across the network. This is not a key escrow, in that the clients do not store the encryption key or transfer it over the network, but otherwise, NBDE behaves similarly.
NBDE follows a client-server architecture, based on two main components:
-
Tang: software that runs on the server. It is based on JSON over HTTP. It is assumed to run in a controlled environment (typically a private network inside a data center), and provides an HTTP endpoint for Clevis (client) to connect and get the keys. It is based on McCallum-Relyea key exchange, which is characterized by the following items:
- It is based on the Diffie-Hellman algorithm + Integrated Encryption Scheme.
- The encryption key does not leave the client (acts as a private key).
-
Clevis: software that acts as the client. It runs on the device whose disks need to be encrypted and automatically unlocked. Automatic decryption provided by Clevis is based on different “pins”, which are plugins to provide this automatic decryption by using different technologies. The available pins are:
- tang: It provides real NBDE. It allows connection to the Tang server via HTTP.
- tpm2: secure cryptoprocessor on the machine
- sss: for composed configurations (example: achieve High Availability using two or more Tang servers)
Clevis and Tang are generic client and server components that provide NBDE. Red Hat Enterprise Linux uses these components in conjunction with Linux Unified Key Setup-on-disk-format (LUKS) to encrypt and decrypt root and non-root storage volumes and accomplish Network-Bound Disk Encryption.
When a client starts, it attempts to contact a predefined set of Tang servers by performing a cryptographic handshake. If it can reach the Tang server, the node can construct its disk decryption key and unlock the disks to continue booting. If the node cannot access a Tang server due to a network outage or server unavailability, the node cannot boot and continues retrying indefinitely until the Tang servers become available again. Because the key is effectively bound to the node’s presence in a network, an attacker attempting to gain access to the data at rest must be able to obtain both the disks on the node and network access to the Tang server as well.
As remarked previously, one of the most important characteristics of NBDE is its usage of the McCallum-Relyea key exchange. In the next section, a more detailed description of how this exchange works is detailed.
McCallum-Relyea key exchange
McCallum-Relyea key exchange is an alternative method to key escrow that allows the regeneration of a decryption key without requiring its retrieval. This algorithm is an advanced version of the Diffie-Hellman key exchange algorithm. The first half of the algorithm works just like the Diffie-Hellman exchange, but the shared key is only used for additional computation. It computes with additional random variables. The client stays completely anonymous to the server and there is absolutely no encryption needed when this random data is transferred between the client and the server.
McCallum-Relyea key exchange is performed in two steps:
- Provisioning: When a node containing encrypted disks is configured through Clevis software to be unlocked using a Tang server, a key exchange is performed between client and server without the secret client key leaving the node.
The server generates a key pair with private key S and public key s. It then advertises the public key s. The client also creates a key pair with private key C and public key c.
After that, the client creates a symmetric key K using the server public key s and its own private key C.
In this case, though, the client does not advertise its public key c, which means that only the client can derive K. The client writes K in one of the LUKS slots. Once the client stores K, it discards both K and its private key C, which means that the client can no longer derive K, at least without the help of the server.
The following diagram illustrates this process:
- Recovery: When a device containing encrypted disks boots or is mounted, the Clevis client must generate the secret key by recovering the required information from the server. The client generates a new key pair e, and generates a message key for the server. Based on what comes back from the server, the client can derive K. Once K is derived, Clevis then passes this key into the normal disk mounting process (dmcrypt) to mount the volume without waiting for a manually entered password.
The following diagram illustrates the secret key regeneration process:
According to the previous provisioning and recovery processes, note that:
- During provisioning, only the server public key is needed. This public key is not exclusive for Clevis clients to boot through the network. It could be used by Clevis ‘off-line’ or during a provisioning process where the Tang Server is unreachable.
- There is no state on the server. No decryption keys are transferred, meaning no escrow is involved.
- All transferred data (s, x, y) are either public or meaningless to an eavesdropper, so no TLS or other encryption of the channel is required.
NBDE vs. key escrow comparison
The previous section detailed how the McCallum-Relyea key exchange works, and the characteristics it provides. Because the McCallum-Relyea key exchange is the core of NBDE, the following table can compare its benefits directly against the key escrow exchange method:
Functionality | Key escrow | NBDE |
---|---|---|
Protects against single-disk theft | ✅ | ✅ |
Protects against entire-server theft | ✅ | ✅ |
Encryption key is never transmitted over a network | ❌ | ✅ |
Client-Server transport encryption unnecessary | ❌ | ✅ |
Red Hat support | ❌ | ✅ |
Automation using Ansible roles | ❌ | ✅ |
Supported by OpenShift | ❌ | ✅ |
Supported by Libguestfs | ❌ | ✅ |
Supported by Stratis | ❌ | ✅ |
The usage of NBDE is more suitable for automatic remote disk unlocking because:
- McCallum-Relyea key exchange is a more secure mechanism (the encryption key is never transmitted to the network).
- McCallum-Relyea key exchange simplifies the deployment of this kind of scenario because it does not require encryption of traffic between the client and the server for disk unlocking.
- There are different technologies (OpenShift, Ansible, Libguestfs or Stratis) where the usage of NBDE has been integrated. These kinds of technologies are normally not available for key escrow or, if they are, their implementation is much more complicated.
NBDE configuration
This document is not intended to give a detailed description of how NBDE is configured. See the following RHEL product documentation for detailed guidance:
- Installation of client software, Clevis. Clevis client software must be installed appropriately on every system with encrypted disks that need to be automatically unlocked.
- Installation of server software, Tang. Tang server must be installed and started on every system that will be part of the deployment.
- Configuration of a client to bind to Tang server(s). Each of the clients must be appropriately configured to use one or multiple Tang servers for key retrieval during the startup process.
In case of large deployments, with multiple clients and servers involved, prefer the usage of NBDE Ansible roles.
Key rotation
Key rotation is a mechanism to preserve the security of an NBDE environment over the long run. Key rotation is recommended when a possible data leak can occur, such as in the case of the theft of a device. Rotation should be also performed periodically, with periods depending on different aspects, such as:
- security constraints on a particular deployment
- key sizes
- institution internal policy
Key rotation involves three operations:
- Generating new keys on the server, rotating the existing active ones. The newly generated keys will be the ones to be advertised.
- Rebinding clients to newly generated keys. Clients will continue to work with hidden rotated keys, but it is strongly recommended to do the rebinding to use the newly generated keys.
- Deletion of old keys on the server. Once all Clevis clients have been re-keyed with new keys, old keys can be removed from the server. Do this with caution, because deleting the old key before all NBDE-encrypted nodes have completed their rekeying causes those nodes to become dependent on any other configured Tang servers. If no other servers exist, automatic unlock will not be possible. In summary, rotated keys can only be removed when all clients have been rebound to the new keys. Otherwise, data loss might occur.
The following scheme shows the different phases involved in the key-rotation operation:
The key-rotation operation involves different steps which are, somehow, manual, and consequently, error-prone. However, the NBDE Ansible roles allow key-rotation operations to be performed correctly, automatizing each of the previous steps and ensuring all required actions are performed successfully and in the correct order.
NBDE scenarios
NBDE scenarios can be sorted out by using the required parameters of deployments, which are:
- Perimetral security
- Load Balancing
- Geographic redundancy
Simple NBDE scenario
A very simple scenario includes one or only a few clients and a Tang server:
This kind of scenario is very limited and only recommended to be used for very small test deployments or proof of concepts, as it is subject to different failure points:
- Internal network outage
- Tang server outage
NBDE scenario with load balancing
A more advanced scenario compared to the previous one contains one or a few clients and more than one Tang server with load balancing to distribute traffic among them:
This kind of scenario is not as limited as the one before and could be a good entry point for small organizations with no possibility to deploy a geographically redundant network. The possible failure points could be:
- Internal network outage
This kind of scenario would involve configuring Clevis against different Tang servers by using an SSS pin.
NBDE scenario with redundant network segments
The scenario with redundant network segments can fix the previous failure points on the previous deployment by including a duplicated network segment:
This kind of scenario would also involve configuring Clevis against all Tang servers. The network designer will configure load balancing between Clevis clients and all Tang servers through the use of an SSS pin.
NBDE deployments can be configured with multiple network topologies, depending on the kind of usage and organization involved. Different requirements (such as load balancing or backup) can be analyzed and adapted to particular deployment requirements.
NBDE disaster recovery considerations
This section describes several potential disaster situations that could take place on an NBDE deployment and the procedures to respond to each of them:
-
Loss of a client machine:
The loss of a cluster node that uses a Tang server to decrypt its disk partition is not a disaster. Whether the machine was stolen, suffered hardware failure or another loss scenario is not important - the disks are encrypted and considered unrecoverable.
However, in the event of theft, a precautionary rotation of the Tang server’s keys and rekeying of all remaining nodes would be prudent to ensure the disks remain unrecoverable even in the event the thieves subsequently gain access to the Tang servers.
To recover from this situation either reinstall or replace the node. -
Loss of client network connectivity:
The loss of network connectivity to an individual node will cause that the node won't be able to boot automatically.
If you are planning a deployment that might incur a loss of network connectivity, it is possible to reveal the passphrase for an operator onsite to use manually, and then rotate the keys afterward to invalidate it.
The lack of network access at the node can reasonably be expected to impact that node’s ability to function as well as its ability to boot. Even if the node can boot through manual intervention, the lack of network access would make it effectively useless. -
Loss of a network segment:
In a scenario with multiple network segments each of them containing one or more Tang servers, the loss of a network segment that makes a Tang server temporarily unavailable has the following consequences:- Configured nodes continue to boot normally, provided other servers are available.
- New nodes cannot establish their encryption keys until the network segment is restored. In this case, ensure connectivity to remote geographic locations for high availability and redundancy. This is because when you are installing a new node or rekeying an existing node, all of the Tang servers you are referencing in that operation must be available, or a copy of the Tang server public key must be available during provisioning time along with the IP address of the Tang server(s).
-
Loss of a Tang server:
The loss of an individual Tang server within a load-balanced set of servers with identical key material is completely transparent to clients.
The temporary failure of all Tang servers associated with the same URL, that is, the entire load-balanced set, can be considered the same as the loss of a network segment. Existing clients are able to decrypt their disk partitions so long as another preconfigured Tang server is available. New clients cannot enroll until at least one of these servers comes back online.
You can mitigate the physical loss of a Tang server by either reinstalling the server or restoring the server from backups. Ensure that the backup and restore processes of the key material are adequately protected from unauthorized access. -
Compromise of key material:
The compromise of individual key material on a Tang server, such as the physical theft of a Tang server or associated data, requires an immediate rotation of keys. Specifically, perform the following actions:- Rekey any Tang server holding the affected material.
- Rekey all clients using the Tang server.
- Destroy the original key material.
Carefully assess any compromise of key material that might have led to the compromise of the master encryption key on any given node. Ideally, take the server offline and perform a full re-encryption of its disk. Reformatting and reinstalling on the same physical hardware, although taking longer in clock time, is easier and can be more rigorously automated and tested.
Conclusion
This article provided a detailed description of NBDE technology together with the key exchange algorithm that it uses: McCallum-Relyea. It also detailed why NBDE is a better key-retrieval solution for the automatic unlocking of encrypted disks when compared to other solutions, such as key escrow.
References
[1] OpenShift Container Platform: About disk encryption technology
Comments