Cryptographic Right Answers
Thanks to my background as FreeBSD Security Officer, as a cryptographic researcher, and as the author of the Tarsnap secure online backup system, I am frequently asked for advice on using cryptography as a component in secure systems. While some people argue that you should never use cryptographic primitives directly and that trying to teach people cryptography just makes them more likely to shoot themselves in their proverbial feet, I come from a proud academic background and am sufficiently optimistic about humankind that I think it's a good idea to spread some knowledge around. In light of this, I've put together a list of "Cryptographically Right Answers" -- which is to say, a list of recommendations for using cryptography which, if followed, will make sure you get things right in the vast majority of situations.
Encrypting data:
Use
AES
in
CTR
(Counter) mode, and append an
HMAC.
AES is about as standard as you can get, and has done a good job of resisting
cryptologic attacks over the past decade. Using CTR mode avoids the
weakness of ECB mode, the complex (and bug-prone) process of padding and
unpadding of partial blocks (or ciphertext stealing), and vastly reduces
the risk of side channel attacks thanks to the fact that the data being
input to AES is not sensitive. However, because CTR mode is malleable,
you should always add an HMAC to confirm that the encrypted data has not
been tampered with.
In some situations it may be preferable for performance reasons to use a
cipher mode such as GCM which combines encryption and authentication; but
this benefit is small (HMACs are fast) and increases the risk of side
channel attacks (because attacker-supplied input is processed).
UPDATE: I've posted a more detailed explanation about
why I recommend using the
Encrypt-then-MAC composition.
AES key length: Use 256-bit AES keys.
Theoretically speaking, 128-bit AES keys should be enough for the
forseeable future; but for most applications the increased cost of using
256-bit keys instead of 128-bit keys is insignificant, and the increased
key length provides a margin of security in case a side channel attack
leaks some but not all of the key bits.
Symmetric signatures (added 2009-09-28): Use an
HMAC.
I didn't think it was necessary to point this out, but I realize now
(2009-09-28, three months after first writing this list) that there are
some people for whom it should be spelled out. Do not design your own
way of generating symmetric signatures (e.g., for API requests);
especially avoid the common "concatenate key and data, then input to
a hash function" approach.
Hash / HMAC algorithm: Use
SHA256 / HMAC-SHA256 for
now, but plan on upgrading to
the
upcoming SHA-3 hash within the next 5-10 years.
Given the recent attacks on MD5 and SHA1, I would not be surprised if
SHA256 is "broken" within the next few years; but moving from a theoretical
break (i.e., an algorithm for finding a collision in less than 2^128 time)
to a practical weakness is likely to take several years (based on the
history of past hash algorithms). While SHA512 could be used as a
"stop-gap" measure in the event the SHA256 is broken before SHA-3 is
available (realizing, of course, that the similarities between the
designs of SHA256 and SHA512 make it likely that any weakness in SHA256
would translate to a corresponding weakness in SHA512), the fact that
SHA512 uses 64-bit arithmetic makes it more likely that implementations
on 32-bit systems will be vulnerable to side channel attacks.
Random IDs: Use 256-bit random numbers.
The "birthday paradox" states that in order to avoid collisions you
need to select random values from twice the bit-size of the number of
values you will be selecting. I doubt any application thus far has
come close to selecting 2^64 random values; but if computers continue
to scale exponentially, this could occur in the upcoming decade. In
most applications, using 256-bit random values instead of 128-bit
random values carries no significant increase in cost; but it puts
randomly finding a collision safely into the realm of "not going to
happen with all the computers on Earth in the lifetime of the solar
system" problems.
Password handling: As soon as you receive a password, hash it using
scrypt or
PBKDF2 and erase the
plaintext password from memory.
Do NOT store users' passwords. Do NOT hash them with MD5. Use a real
key derivation algorithm. PBKDF2 is the most official standard; but
scrypt is stronger.
Please keep in mind that even if YOUR application isn't particularly
sensitive, your users are probably re-using passwords which they have
used on other, more sensitive, websites -- so if you screw up how you
store your users' passwords, you might end up doing them a lot of
harm.
Asymmetric encryption: Use
RSAES-OAEP with
SHA256 as the hash function, MGF1+SHA256 as the mask generation
function, and a public exponent of 65537. Make sure that you
follow the decryption algorithm to the letter in order to avoid
side channel attacks.
Many applications use PKCS #1 v1.5 encryption; this algorithm has
known weaknesses and should be avoided (there are workarounds for said
weaknesses -- but there might also be other undiscovered weaknesses).
In contrast, RSAES-OAEP has been proven to be secure under fairly
reasonable assumptions.
I recommend using SHA256 here mostly for consistency. Many people use
SHA1, and in this context it is perfectly adequate -- but why use two
different hashes if you can get away with only using one?
Using a public exponent of 65537 is not absolutely necessary -- a
public exponent of 3 is theoretically just as secure -- but there have
been a couple of attacks (against PKCS #1 v1.5 padding, and my cache
based side channel attack) which were much easier given a small public
exponent, so it's possible that using a public exponent of 65537 will
help defend against weaknesses discovered in the future.
If you are not careful about how you decrypt RSAES-OAEP-encrypted data,
you will leak information which can be used to steal your key; if you
follow the algorithm precisely you will be safe, but it is very easy
to get this wrong.
Asymmetric signatures: Use
RSASSA-PSS with
SHA256 as the hash function, MGF1+SHA256 as the mask generation
function, and a public exponent of 65537.
The RSASSA-PSS signature scheme has been proven to be secure under
reasonable assumptions; there is really no reason to use anything
else. Unlike RSAES-OAEP, the choice of hash function is important
here; I recommend SHA256, but (as discussed above) switching to
SHA-3 will be advisable once that standard is released. As with
RSAES-OAEP, using a public exponent of 65537 is not strictly
necessary, but might help prevent some attacks.
Many people recommend using DSA or elliptic curve based signature
schemes which are faster and/or produce smaller signatures than
RSASSA-PSS; in some situations these advantages are important, but
in most cases I prefer RSASSA-PSS because its relative simplicity
makes implementation errors and side channel attacks less likely.
Diffie-Hellman: Operate over the 2048-bit
Group #14
with a generator of 2. Be careful about side channel attacks.
This group is large enough that it should be secure for the near
future; and as it is defined based on the binary digits of Pi, it
clearly was not chosen to have any specific weaknesses. There is
absolutely no excuse for allowing Diffie-Hellman groups to be
defined at run-time; this adds a great deal of complexity and
potential for cryptographic weaknesses, and serves no purpose
whatesoever.
Because Diffie-Hellman requires operating on attacker-supplied input,
there is a significant danger of side channel attacks; using some
form of base or exponent blinding may be required.
Website security: Use OpenSSL.
OpenSSL has a
horrible track
record for security; but it has the saving grace that because it
is so widely used, vendors tend to be very good at making sure that
OpenSSL vulnerabilities get fixed promptly. I wish there was a better
alternative, but for now at least OpenSSL is the best option available.
UPDATE: For added security,
terminate SSL connections in
restricted environment and pass the raw HTTP over a loopback
connection to your web server.
Client-server application security: Distribute the server's public
RSA key with the client code, and do not use SSL.
One of the reasons OpenSSL has such a poor track record is that the
SSL protocol itself is highly complex. Certificate chains, revocation
lists, ASN.1, multiple different hashing and encryption schemes... when
you have over a hundred thousand lines of code, it's no wonder that bugs
creep in.
If you're distributing client code which speaks to a server you operate,
there is no need to use SSL; instead, you can distribute the server's
public RSA key (or its hash) along with the client code, and "bootstrap"
the security process that way. I do this in FreeBSD for the FreeBSD
Update and Portsnap services, and I also do this in Tarsnap. It's
simple; it works; and it's secure.
Online backups: Use
Tarsnap.
Ok, I have a slight bias here! -- but in all honesty, I do trust
Tarsnap's security far more than that of any other backup system, and
not just because I wrote tarsnap. Most backup systems are written by
people who are interested in backups, and have security more or less
as an afterthought; in contrast, I know very little about backups (I
would never have gotten started with tarsnap if it hadn't been for
Tim Kientzle writing his excellent
libarchive
library and effectively dropping a free tar implementation in my lap),
but I do know a lot about cryptography and security; and I wrote
tarsnap from that perspective.
While I agree that most of my readers should never write any cryptographic code, I hope that those of you who do end up writing cryptographic code will end up writing better and more secure code thanks to this -- and more importantly, I hope those of you who never write any cryptographic code will learn enough from this that you will be able to recognize when people are doing things wrong. Bugs become shallow given enough eyeballs -- but only when those eyeballs know enough about the relevant code to be able to recognize bugs when they are spotted.