First a brief history of Diffie-Hellman for those not familiar with it
The short version of Diffie-Hellman is that two parties (Alice and Bob) want to share a secret so they can encrypt their communications and talk securely without an eavesdropper (Eve) listening in. So Alice and Bob first share a public prime number and modulus (which Eve can see). Alice and Bob then each choose a large random number (referred to as their private key) and apply some modular arithmetic using the shared prime and modulus. If everything goes as planned Alice sends her answer to Bob, and Bob sends his answer to Alice. They each take the number sent by the other party, and using modular arithmetic, and their respective private keys are able to derive a number that will be the same for both of them, known as the shared secret. Even if Eve listens in she cannot easily derive the shared secret.
However if Eve has sufficiently advanced cryptographic experts, and sufficient computing power it is conceivable that she can derive the private keys for exchanges when a sufficiently small modulus is used. Most keys, today, are in the range of 1024 bit or larger meaning that the modulus is at least several hundred digits long.
Essentially if Alice and Bob agree to a poorly chosen modulus number then Eve will have a much easier time deriving the secret keys and listening in on their conversation. Poor examples of modulus numbers include numbers that are not actually prime, and modulus numbers that aren’t sufficiently large (e.g. a 1024 bit modulus provides vastly less protection than a 2048 bit modulus).
Why you need a good prime for your modulus
A prime number is needed for your modulus. For this example we'll use 23. Is 23 prime? 23 is small enough that you can easily walk through all the possible factors (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), divide 23 by them and see if there is a remainder. But much larger prime numbers, such as ones that are in the mid to high hundreds of digits long, are essentially impossible to factor unless you have a lot of computational resources, and some really efficient ways to try and factor prime numbers. Fortunately there is a simple solution to this, just use an even larger prime number, such as a 2048, 4096 or even 16384 bit prime number. But when picking such a number how can you be sure it’s a prime and not easily factored? Ignoring the obvious give-aways (like all numbers ending in 0, 2, 4, 5, 6 and 8) there are several clever mathematical algorithms for testing the primality of numbers and for generating prime numbers.
Miller-Rabin, Shawe-Taylor and FIPS 186-4
The Miller-Rabin primality test was first proposed in 1976, and the Shawe-Taylor strong prime generation was first proposed in 1986. One thing that is important to remember, is that back when these algorithms were made public the amount of computing power available to generate/factorize prime numbers was much smaller than is now available. The Miller-Rabin test is a probabilistic primality test, you cannot conclusively prove a number is prime, but by running the Miller-Rabin test multiple times with different parameters you can be reasonably certain that the number in question is probably prime and with enough tests your confidence can approach almost 100%. Shawe-Taylor is also probabilistic, you're not 100% guaranteed to get a good prime, but the chances of something going wrong and getting a non-prime number are very small.
FIPS 186-4 covers the math and usage of both Miller-Rabin and Shawe-Taylor, and gives specific information on how to use them securely (e.g. how many rounds of Miller-Rabin you’ll need to use, etc.). The main difference between Rabin-Miller and Shawe-Taylor is that Shawe-Taylor generates something that is probably a prime, whereas with Miller-Rabin you generate a number that might be prime, and then test it. As such you may immediately generate a good number, or it may take several tries. In testing on a 3Ghz CPU, using a single core it took me between less than a second and over 10 minutes to generate a 2048 bit prime using the Miller-Rabin method.
Generating primes in advance
The easiest way to deal with the time and computing resources needed to generate primes is to generate them in advance. Also because they are shared publicly during the exchange you can even distribute them in advance, which is what has happened for example with OpenSSL. Unfortunately many of the primes currently in use were generated a long time ago when the attacks available were not as well understood, and thus are not very large. Additionally, there are relatively few primes in use and it appears that there may be a literal handful of primes in wide use (especially the default ones in OpenSSL and OpenSSH for example). There is now public research to indicate that at least one large organization may have successfully attacked several of these high value prime numbers, and as computational power increases this becomes more and more likely.
To generate Diffie-Hellman primes in advance is easy, for example with OpenSSL:
openssl dhparam [bit size] -text > filename
so to generate a 2048 bit prime:
openssl dhparam 2048 -text
Or for example with OpenSSH you first generate a list of candidate primes and then test them:
ssh-keygen -G candidates -b 2048 ssh-keygen -T moduli -f candidates
Please note that OpenSSH uses a list of multiple primes so generation can take some time, especially with larger key sizes.
Defending - larger primes
The best defense against someone factoring your primes is to use really large primes. In theory every time you add a single bit to the prime you are increasing the workload significantly (assuming no major advances in math/quantum computing that we don’t know about that make factorization much easier). As such moving from a 1024 bit to 2048 bit prime is a huge improvement, and moving to something like a 4096 bit prime should be safe for a decade or more (or maybe not, I could be wrong). So why don’t we simply use very large primes? CPU power and battery power are still finite resources, and very large primes take much more computational power to use, so much so that very large primes like 16384 bit primes become impractical to use, introducing noticeable delays in connections. The best thing we can do here is set a minimum prime size such as 2048 bits now, and hopefully move to 4096 bit primes within the next few years.
Defending - diversity of primes
But what happens if you cannot use larger primes? The next best thing is to use custom generated prime numbers, this means that an attacker will have to factor your specific prime(s), increasing their workload. Please note that even if you can use large primes, prime diversity is still a good idea, but prime diversity increases the amount of work for an attacker at a much slower rate than using larger prime does. The following apps and locations contain primes you may want to replace:
Apache with mod_ssl: “SSLOpenSSLCondCmd DHParameters [filename]” or append the DH param to the SSLCertificateFile
Some excellent articles on securing a variety of other services and clients are:
Current and future state
So what should we do?
I think the best plan for dealing with this in the short term is deploying larger primes (2048 bits minimum, ideally 4096 bits) right now wherever possible. For systems that cannot have larger primes (e.g. some are limited to 768, 2013 bits or other related small sizes) we should ensure that default primes are not used and instead custom primes are used, ideally for limited periods of time, replacing the primes as often as possible (which is easier since they are small and quick to generate).
In the medium term we need to ensure as many systems as possible can handle larger prime sizes, and we need to make default primes much larger, or at least provide easy mechanisms (such as firstboot scripts) to replace them.
Longer term we need to understand the size of primes needed to avoid decryption due to advances in math and quantum computing. We also need to ensure software has manageable entry points for these primes so that they can easily be replaced and rotated as needed.
Why not huge primes?
Why not simply use really large primes? Because computation is expensive, battery life matters more than ever and latency will become problems that users will not tolerate. Additionally the computation time and effort needed to find huge primes (say 16k) is difficult at best for many users and not possible for many (anyone using a system on a chip for example).
Why not test all the primes?
Why not test the DH params passed by the remote end, and refuse the connection if the primes used are too small? There is at least one program (wvstreams [wvstreams]) that tests the DH params passed to it, however it does not check for a minimum size, it simply tests the DH params for correctness. The problem with this is twofold, one there would be a significant performance impact (adding time to each connection) and two, most protocols and programs don’t really support error messages from the remote end related to the DH params, so apart from dropping the connection there is not a lot you can do.
As bad as things sound there is some good news. Fixing this issue is pretty trivial, and mostly requires some simple operational changes such as using moderately sized DH Parameters (e.g. 2048 bits now, 4096 within a few years). The second main fix for this issue is to ensure any software in use that handles DH Parameters can handle larger key sizes, if this is not possible then you will need to place a proxy in front that can handle proper key sizes (so all your old Java apps will need proxies). This also has the benefit of decoupling client-server encryption from old software which will allow you to solve future problems more easily as well.