Poor Clevis and Tang Documentation

Latest response

After following setup instructions for a RHEL 8 tang/clevis infrastructure and everything was working (https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/system_design_guide/configuring-automated-unlocking-of-encrypted-volumes-using-policy-based-decryption_system-design-guide), I found that nearly all documentation for both tang and clevis is extremely bare-bones, only for initial setup, sometimes incorrect or vague, and doesn't every include how to verify settings. (Online and man pages and help texts.)

For example, once I've bound a luks volume to clevis using tang as a pin, there is no [easy] way to retrieve a human readable policy. (Answer the question: How many necessary and of what kind of pins can unlock this device? If it is a tang pin, then what URLs?) As such, I can show that the LUKS keyslots are in use and that those keyslots have clevis metadata, but I have no ability to decode the clevis metadata for inspection, though it must be accessible or clevis/jose would not know how to unlock the device.

Additionally, and maybe related since this all may be a documentation issue, the clevis and tang and jose man pages are woefully slim, do not define acronyms for new users ("JWE" and "PT", etc.), have actual errors in them (clevis-decrypt(1) even says that there are no parameters but the SYNOPSIS has a parameter and also references itself in the SEE ALSO section.), and are generally disorganized and have small typos. (jose-fmt(1) is nearly impossible to find parameters as they are neither alphabetized nor ordered within their [my assumption] grouped functionality, along with extra space(s) in the Overview, etc.) .

Responses

Hello Nathan,

Thank you for your feedback. Because I am the current maintainer of the RHEL7 and RHEL8 Clevis+Tang-related documentation on the Customer Portal, I am very interested in the specific feedback for that part of Clevis and Tang docs - which parts are not clear and what exactly have you found as incorrect? You can either report them here in this conversation or use the Direct Documentation Feedback (DDF) feature available for logged-in users on the Customer Portal (you can highlight a problematic part of the documentation, provide comment, and the system creates a bug in Red Hat Bugzilla automatically).

Regarding the last paragraph - I am going to notify also the current maintainer of the mentioned packages. Your specific feedback helps us to fix and improve the man pages.

Thank you.

Hi Nathan, and thanks for the feedback on the man pages. Also, thanks Mirek for pointing me to this feedback.

I just looked into the man pages of jose/jose-fmt/clevis-decrypt and I agree we can do better. The man page for jose mentions some (not all) of the used acronyms in the overview section, when listing the standards and the associated RFC documents, but as you reported, clevis man pages just throw the acronyms without much context, if any. Thanks also for catching the errors in clevis-decrypt. As for jose-fmt, I will also look into how we can reorganize the options so that they make sense/become easier to find.

Best Regards and thanks again.

Thanks Mirek and Sergio for looking into some of these.

A couple of other updates that are likely needed, one of which could be a predecessor to production outages:

1:

I started another discussion regarding potential future production outages due to a conflict of documentation here: https://access.redhat.com/discussions/4344661. In short: the root-device unlocking Tang server setup instructions cat the file /etc/systemd/system/multi-user.target.wants/tangd.socket, which is correct to inspect the file, but is very dangerous to just write since it is a symlink to whichever /lib or /etc path that is currently active. A yum update may actually overwrite custom port settings if a user never copied the file to /etc and re-enabled the unit file which could cause production tang-server outages just due to normal patching.

2:

This may be a actual missing feature, or it could just be documentation, but there is no way that I've found to actually follow some of the recommendations of the Tang/Clevis online documentation. For example: in section 36.4. Rotating Tang keys and step #3, this is what is instructed:

At this point, new client bindings pick up the new keys and old clients can continue to utilize the old keys. When you are sure that all old clients use the new keys, you can remove the old keys. 

Warning

Be aware that removing the old keys while clients are still using them can result in data loss. 

However, there is no way that at least I've found to actually check a client to see to which servers and keys it is bound. (I formulated a large pipeline of jose-fmt commands in tandom with a direct LUKS metadata dump and was able to get partial information output , but it was hardly considered "viable" for production usage.) So, I can never reliably verify that a client is not still bound to an old key or to an old server.

Note that these may also apply to RHEL 7 as well, but I have not verified that documentation.

Hi again, Nathan,

Here's the answers to your inquiries:

1: You are correct; to specify a custom port for tang, one should follow the approaches listed in the "Modifying existing unit files" section, in the documentation.

One way to use a custom port with tang would be the following:

1) Create an override file with

 systemctl edit tang.socket

You will be presented with an empty file -- located at /etc/systemd/system/tangd.socket.d/override.conf --, which you can then add the following content:

[Socket]
ListenStream=
ListenStream=7500

2) Save the file and issue the following command:

systemctl daemon-reload

3) Check the port configured to listen with

systemctl show tangd.socket -p Listen

, to make sure you have the port configured correctly.

You should see the following output:

Listen=[::]:7500 (Stream)

Now you can go ahead and enable/start the unit. I will be working with Mirek to have the documentation updated -- thanks once more for the feedback.

2: Regarding the rotation of keys in tang, there is work ongoing upstream (https://github.com/latchset/clevis/pull/113) to provide a simple way to check whether there are rotated keys and also to regenerate such rotated keys. Feel free to follow the development in there and also feel free to report issues in there directly.

Best Regards.

Sergio, Thanks for the thoroughness of your response. I especially appreciate the note on the "systemctl edit" command to create that override file which will be even more future/update-proof than my suggested copy/edit of the whole file. That pull request looks really good for the reporting and re-binding portion of clevis. I hope that Mr. McCallum gets around to reviewing it pretty soon since that would also be the beginning for things like Puppet/Ansible to key off of, since it is really hard to do idempotent runs/state-based configurations without knowing a current-state. Is there any chance to notify Mr. McCallum via internal RedHat channels so it doesn't just get stuck on a back-burner for years in pull-request limbo? I know there are other bugzilla tickets/RFEs for things like clevis Ansible support, and this would be a prime enabler for that.

Hi Nathan,

We've also updated the Deploying a Tang server procedure in the RHEL 8 Security hardening title.

Thank you again for your helpful report.