RHEL 6 on VMware /dev/random entropy issue

Latest response

Hi

I was wondering if anyone has had issues with the /dev/random RNG device being extremely slow on RHEL 6.5 when running as an VMware guest? When ssh-ing between servers the ssh login process hangs for many 10s of seconds (sometimes minutes) and key generation, using ssh-keygen, also hangs for ages. With some debugging I have narrowed the issue down to the processes waiting for /dev/random. Linking /dev/random (as per http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1036980) to /dev/urandom makes the processes run at the expected speeds.

I have an open case regarding the issue and I suggested that the link might be a good work-around (based on research into differences between /dev/random and urandom [e.g. http://www.2uo.de/myths-about-urandom/] but RH support are suggesting a hardware number generator. I've never used a hardware number generator but I suspect, in a non-vm environment, it is a straightforward to setup. I am, however, wondering if anyone has had experience in using one in a VMware environment with many guests per physical host? Can one hardware number generator feed multiple VMs?

(FYI: The RHEL 6 build in question has had the cc-eal4-config-rhel62 configuration applied which includes setting "SSH_USE_STRONG_RNG=12" in /etc/sysconfig/sshd, among other things!)

Thanks in advance,

Aidan

Responses

FWIW: the /dev/random being "slow" thing isn't a RedHat-only issue. I've admin a number of different commercial and OSS based systems and use of /dev/random is always slow in comparison to /dev/urandom (blocking versus non-blocking entropy service).

Can't really speak to how well ESX bubbles up its hardware-based entropy subsystems. That said...

Typically, if /dev/random proves too slow and /dev/urandom is deemed to suffer from insufficient randomness, you'd look at a solution like PRNG. However, PRNG is best-suited to graphical (physical) workstations rather than headless servers. You might want to look at the "haveged" service as an alternative: you can get it via EPEL.

That 2uo page you linked to is fantastic. Finally someone who collects all the myths about randomness and debunks them.

I have done a bit of research into this, though I have little actual configuration experience.

Older Linux kernels used to read from many IRQ sources like network interfaces, following the theory that with all the traffic happening on a LAN which a system sees, the NIC's behaviour at supplying that traffic would be unpredictable. At some point in the past, someone decided it was possible to game that if you could control traffic on the LAN. Nowadays the only sources of entropy are human interfaces like keyboards and mice, and actual hardware entropy generators.

VMWare doesn't provide hardware entropy to guests at all, which is why /dev/random starves.

My first advice would be to actually test /dev/urandom. In my own tests, inside a KVM VM with no source of hardware randomness, /dev/urandom could supply literally megabytes of randomness before numbers became predictable enough to fail a FIPS-140 test.

You can test any entropy source for FIPS compliance with rngtest which is supplied in the rng-tools package. See man rngtest for the correct syntax, many examples are also available which a quick web search.

If you control the software in question, one option is to use the small amount of randomness which /dev/random supplies to seed your own PRNG algorithm inside your application. As the page you linked to says, even 256 bytes (2048 bits) should be a large enough seed.

Like Tom said, you can run a PRNG entropy daemon like haveged. This reads randomness from tiny fluctuations in the processor's TSC timer (also emulated in VMWare) to seed random numbers. The theory is explained in-depth on the haveged website.

It appears you've read about hardware randomness generators, usually these are little USB dongles which collect some sort of environmental thermal/photoelectric/quantum noise from around them and use it to seed a hardware RNG. Wikipedia has a comparison of some. As far as I understand, one device can only feed one system. This would imply you'd need one hardware dongle per VM (and a lot of USB ports!) which leads into our next option.

You can also use an "entropy broker". This is a daemon which runs on a true source of hardware randomness, and uses a network client/server can send that randomness to multiple systems.

IIUC you could have a system (physical or virtual) with a USB RNG running a broker which supplies hardware randomness to several other machines. I guess your entropy requirements would dictate if/when you'd starve one RNG enough to require more generators and brokers.

Looping back over all these options, the first thing I would do is test /dev/urandom with rngtest. Test it with the amount of entropy you actually need and see if it stays sufficiently random over time. In my (admittedly limited) experience, you'll be able to get a lot of unpredicable data before resorting to any of the above methods.

As a final note, RHEL 6.6 beta is testing virtio-rng, a paravirtualized hardware random number generator which feeds KVM hypervisor entropy to KVM guests. Check the Release Notes after 6.6 comes out to see if this was included in the GA release, and whether it's Tech Preview or fully supported.

Update: RHEL 6.6 and later support virtio-rng, described at Does the Red Hat Enterprise Linux KVM Hypervisor support the VirtIO Random Number Generator?.

That 2uo article was new to me. Masterfully-done. Wow.
I gave my own shot at writing something similar a few years ago when I started at Red Hat: Entropy & the Linux kernel: /dev/random versus /dev/urandom

This is also great.

Thank you Tom and Jamie for your helpful replies.

As I mentioned, I have calls open with Red Hat and VMware on this issue.

Red Hat have replied that "linking /dev/random to /dev/udandom...is not a process recommended by Red Hat and we cannot guarantee the cryptographical security of this operation".

Vmware have replied that they don't support Hardware Number Generators but I am double checking that with them.

I looked into haveged and, like the urandom discussions, there seems to be some disagreement regarding the quality of the randomness it produces. There were, in the same discussions, also comments on how the FIPS-140 randomness test can apparently succeed even with badly seeded data!

I've basically got to come up with a solution that will pass accreditation for a military grade system. From what I have read I suspect that both the urandom link and haveged would be more than adequate to keep the vast majority (if not all) miscreants busy for a long time trying to crack the encryption.

I think my best bet is to test both the urandom link and haveged against the FIPS-140 randomness test and then present these results to determine the way forward.

Thanks again for the input. We've run RHEL5 on VMs for many years and not had any issues with /dev/random with the same software so clearly something has changed for RHEL6 which is limiting the entropy available.

Aidan

Here's redhat's recommendations for devices that are unable to maintain a sufficient entropy pool size. How to increase the entropy pool without using a keyboard or mouse

Can confirm this is neither a VMware Issue nor a RHEL issue (specific to them). I ran in to the same issue with physical nodes. Fortunately we were able to redirect our application to use /dev/urandom. There are some implications that need to be reviewed, however.

I haven't seen this issue for a while (have seen it on VM webservers before). I find now, a lot of places have moved the SSL termination off to hardware load balancers in front of the webservers, which is potentially another option if you are having this issue specifically with SSL (not a Red Hat solution I know!).

Nice - thanks for linking this

Why is the "best response" to a question about entropy on VMware a link to a page that discusses how to resolve the problem on KVM? In my opinion the best responses are Tom's and Jamie's, which actually address the original question.

Agreed. Fixed.