Introducing configinitI have been working on bringing FreeBSD to the Amazon EC2 platform since 2006, and for the past three years I've been blogging about my progress: First FreeBSD on t1.micro instances, then cluster compute instances, then "m1" and "m2" family large and xlarge instances, and finally in early 2012, FreeBSD could finally run on all EC2 instance types. Once I had a hacked-up version of FreeBSD which ran smoothly, I turned my attention towards polishing it: First moving my EC2 scripts into the ports tree, then using binaries from the release ISOs for the FreeBSD world, and finally in early October (with FreeBSD 10.0-ALPHA4) all the necessary bits had been merged to make it possible for me to build EC2 images completely (including the kernel) with "straight off the ISO" binaries. Next on my agenda was taking my images from "pure FreeBSD" to "FreeBSD set up to be used in the cloud", and for that I'm happy to now announce that starting from 10.0-RC1, my FreeBSD AMIs have a new feature: configinit.
Anyone who has been around the world of cloud computing for long is likely to have heard of CloudInit. It is a system originally written for Ubuntu which performs configuration of a system at boot-time based on "user-data" provided via EC2 or from a similar environment (e.g., OpenStack). CloudInit works well for its original purpose, but is less than ideal for FreeBSD systems, for two reasons: First, it relies on Python, which is not part of the FreeBSD base system; and second, it is designed around a concept of configuring a system by running commands rather than editing configuration files.
Now, there are merits to both approaches — most notably, configuring a system by running commands is easier to script, while configuring a system by editing text files has the advantage that given a working configuration there's no doubt about how to reproduce it — but the fact that BSD systems are far more edit-configuration-files oriented (to the point that /etc/rc.conf might be the only configuration file which needs to be edited on some systems), and thus CloudInit is less than optimal for configuring FreeBSD systems.
Enter configinit. Rather than providing instructions such as "tell apt to use this mirror" or "run this python code", configinit handles four types of input:
- If the configuration data starts ">/path/to/file" then the data, minus the first line, will be written to the specified location.
- If the configuration data starts ">>/path/to/file". then the data, minutes the first line, will be appended to the specified location.
- If the configuration data starts "#!", it will be executed (in most cases this would be a shell script).
- For any other inputs, configinit attempts to extract the file as an archive, and (if extraction was successful) runs on each part in turn. The extraction is performed using bsdtar, so archives in tar, pax, cpio, zip, jar, ar, xar, rpm and ISO 9660 formats, optionally compressed using gzip, bzip2, xz, or compress can all be used.
This is much simpler than CloudInit, but — in combination with other tools which are already available on FreeBSD, such as my firstboot-pkgs port, it provides very powerful yet easy-to-use functionality. For example, launching a FreeBSD 10.0-RC1 EC2 instance with the following user-data:
will provide a system with Apache 2.2 installed and running (in my test, within 150 seconds of when I clicked "launch" in the EC2 Management Console) — in addition to performing the other default system initialization behaviours of my EC2 images: checking for updates to the FreeBSD base system, downloading an SSH public key to allow SSH logins as ec2-user, logging SSH host keys to the EC2 console, and autoconfiguring swap space using EC2 ephemeral disks.>>/etc/rc.conf firstboot_pkgs_list="apache22" apache22_enable="YES"
I know for my purposes this will be very useful — for example, while I have the process of configuring a new Portsnap mirror mostly scripted now, using configinit I could have it entirely scripted and avoid the need to ever SSH into the mirrors — and from what other FreeBSD users have told me, I don't think I will be alone. Is there anything else I could do to make FreeBSD even more usable in EC2? Quite likely — but I don't know what. If I'm missing something important, please let me know!
Automated FreeBSD panic reportingIt is now very common for software to have built-in mechanisms for reporting crashes. Windows, OS X, Ubuntu, Android, KDE, Mozilla... there are few large codebases which don't have any such functionality. Until a few days ago, FreeBSD was an exception: The instructions on Kernel Debugging are hidden away in the "Developer's Handbook", and for users who are not in a position to diagnose the cause of a kernel panic themselves, all that could be done is to submit a bug report via the "send-pr" utility — at which point it would join the other 500+ panic reports sitting in FreeBSD's mostly-ignored GNATS repository. A couple of weeks ago, I decided it was time to do something about this.
On Monday I added new "panicmail" code to the FreeBSD ports tree. This code, if installed, will gather basic information about a FreeBSD kernel panic — not much more than the backtrace — and submit it to a central repository (currently known as my inbox). The submitted data is encrypted to ensure that nobody can snoop on any sensitive information (a backtrace which goes through ZFS would reveal that you use ZFS, for example) and in case you're not sure that you trust me with the contents of your kernel panics, the default behaviour is to email them to root along with instructions for how to submit them once you've verified they don't contain anything confidential.
Once I have a significant number of panic submissions, I'll start processing them to provide FreeBSD developers with aggregate data — information which is not tied to any particular machines, but can still be useful by telling developers where to look for the most common kernel panics.
I'm expecting this to be a very useful tool for the FreeBSD project; I know this type of analysis of automatic crash reports has been invaluable for other operating systems, and two FreeBSD-based projects — my work on FreeBSD/EC2, and FreeNAS — have been using automated panic reporting already, with good results. However, this depends on FreeBSD users install and enable the code... and given that the FreeBSD kernel doesn't panic very often, getting a useful number of panic reports requires that as many FreeBSD users as possible contribute.
To install and enable panic reporting on your FreeBSD system:
- Install the sysutils/panicmail port.
Add the lines
dumpdev="AUTO"to your /etc/rc.conf file.
- Make sure that email sent to root goes somewhere (if you're receiving nightly "daily run output" and "security run output" emails, you're good).
Remember, the larger the number of systems configured to submit panics, the more useful this will be!
Don't trust me: I might be a spookShortly after the Snowden papers started to be published, I was invited to write an op-ed about PRISM and its implications for privacy and online security. I initially agreed, but after spending a few hours putting some thoughts together I changed my mind: I really had nothing useful to say. Yes, the NSA is spying on us, listening to our phone calls, and reading our email — but we already knew that, and a few powerpoint slides of confirmation really doesn't change anything. When the first revelations about BULLRUN — the fact that the NSA can read a lot of encrypted data on the internet — appeared, I was similarly unimpressed: If you can find a weakness in an implementation of a cryptographic system, you can often bypass the cryptography, and the US government, via defense contractors, has hundreds of open job postings for exploit writers with Top Secret clearances. If the NSA can break 2048-bit RSA, it would be a Big Deal; if they can break OpenSSL, not so much.
But the latest revelations scare me. It's one thing to find and exploit vulnerabilities in software; there's a lot of software out there which was written by developers with very little understanding of cryptography or software security, and it shows. If you care about security, we reasoned, stick to software written by people who know what they're doing — indeed, when I talk to users of Tarsnap, my online backup service, one of the most common things I hear is "you're good at security, so we know your code will keep our data safe". That reasoning is now clearly flawed: We now have evidence that the NSA is deliberately sabotaging online security — influencing (and weakening) cryptographic standards, bribing companies to insert "back doors" into their software, and even sending developers to "accidentally" insert bugs into products. It's not enough to trust that I know what I'm doing: You have to trust that I'm not secretly working for the NSA.
I'm not working for the NSA, of course, and I haven't sabotaged any of the software I've written — and while that's exactly what someone working for the NSA would say, there are a few reasons to believe me. For a start, I'm not a US citizen, so it would be difficult for me to get a US security clearance, and since my first instinct if approached by the NSA would be to blog about it, I'm not exactly the sort of person they would be inclined to trust. More significantly, I have published cryptographic research: First, in 2005 the first (public) side channel attack exploiting Intel HyperThreading; and in 2009, I published the scrypt key derivation function, which is designed specifically to protect passwords (and the accounts and data they are used to guard) against attack from agencies like the NSA. The NSA does not publish cryptographic research (or much at all, in fact — there's a reason people joke that their name is really an abbreviation for "Never Say Anything") so my having published such research argues against the possibility that I'm covertly working for the NSA. Finally, my reputation and identity are very heavily tied up in security, both as Security Officer for the FreeBSD project and as the author of Tarsnap. If I sabotaged Tarsnap it would indelibly damage my reputation, and it's hard to imagine what inducement anyone could offer which would make me do such a thing.
But none of this is conclusive. Despite all the above, it is still possible that I am working for the NSA, and you should not trust that I am not trying to steal your data. Fortunately, the first principle behind Tarsnap's design is that you should not need to trust me: Data is encrypted on individual client machines, and you have the source code to verify that this is being done securely (and without the keys being in any way leaked to the NSA). If you are a developer who understands C, download the Tarsnap source code and read it — and don't feel that a lack of expertise in security should stop you either: My experience as FreeBSD Security Officer was that most vulnerabilities were found by developers looking at code and noticing that something "seemed wrong", rather than by people with security expertise specifically looking for security vulnerabilities. (If protecting the free world from the NSA is insufficient motivation, I also pay for bugs people find in Tarsnap, as well as scrypt, kivaloo, and spiped, right down to the level of typographical errors in comments).
Naturally, what applies to me also applies to everybody else. For most products, in fact, it applies many times over: It only takes one person to introduce a vulnerability into software, and most organizations do not have a sufficient code review process to reliably catch such bugs (if they did, we would have vastly superior code!) even assuming that there is no institutional corruption. Microsoft may have decided to cooperate with the NSA while Google resisted; but all the NSA needs is one or two cooperative Google employees in the right place.
The only solution is to read source code and look for anything suspicious. Linus's Law states that "given enough eyeballs, all bugs are shallow": If enough people read source code, we will find the bugs — including any which the NSA was hoping to exploit in order to spy on us. The Department of Homeland Security wants to have an army of citizens on the look out for potential terrorists; it's time to turn that around. We need an army of software developers on the look out for potential NSA back doors — to borrow a phrase, if you see something, say something.
And if you can't see anything because you can't get the source code... well, who knows what they might be hiding?
The Factoring CryptopocalypseThere has been some noise recently about a presentation at Black Hat 2013 entitled "Preparing for the Cryptopocalypse". Based on some recent research by Antoine Joux et al., the speakers argued that we should be prepared for the day when RSA is announced to be broken. Personally, I'm not so worried.
The key detail to understand about the work in the Joux papers is that it is limited to solving discrete logarithm problems over fields of small characteristic. As interesting as the work is, mathematically, it should not be a great surprise to cryptographers: We've known for over a decade that small-characteristic fields are "scary", and in 2005 when the NSA announced their "Suite B" cryptography using Elliptic curves, nobody was surprised to see that they selected ECC over prime fields instead. In that sense, this work is akin to seeing a buffer overflow discovered in Sendmail in 2003: Interesting to the research community, but not really a great surprise to anyone who has been paying attention.
When should we worry? If there's any hint of this work being extended to apply to prime fields. The discrete logarithm problem over prime fields is very closely related to the problem of integer factorization — there's a long history of improvements in one translating directly to improvements in the other. (In fact, I'd say DLP over prime fields is more closely related to factoring than it is to DLP over small-characteristic fields.) In the mean time, I see no reason to panic. If you want to switch over to using Elliptic curves, go ahead; but my earlier remarks still apply there: Because ECC is more complex than RSA, it's easier to make implementation errors and/or introduce side channels. I would add one extra caveat however: If you do decide to use ECC, do it over a prime field. While I've never been fond of ECC over small-characterstic (aka. binary) fields, this latest attack provides all the more reason to be cautious about them: It's the inherent "structure" which makes them scary, and the work of Joux et al. just reinforces that such structure can be exploited.
For my own purposes, I'm going to keep on using 2048-bit RSA.
Cryptography is a science, not engineeringThomas Ptacek tweeted yesterday that "If you're not learning crypto by coding attacks, you might not actually be learning crypto." Judging by the number of twitter "favourites" and "retweets" of this comment, it seems to have struck a chord; but with all respect to Thomas, I absolutely disagree. Not only is it possible to learn cryptography without writing a line of code, but coding attacks is entirely useless for learning about modern cryptography; the best route to learning modern cryptography is a study of mathematical proofs.
If we were still in the 1990s, I would agree with Thomas. 1990s cryptography was full of holes, and the best you could hope for was to know how your tools were broken so you could try to work around their deficiencies. This was a time when DES and RC4 were widely used, despite having well-known flaws. This was a time when people avoided using CTR mode to convert block ciphers into stream ciphers, due to concern that a weak block cipher could break if fed input blocks which shared many (zero) bytes in common. This was a time when people cared about the "error propagation" properties of block ciphers — that is, how much of the output would be mangled if a small number of bits in the ciphertext are flipped. This was a time when people routinely advised compressing data before encrypting it, because that "compacted" the entropy in the message, and thus made it "more difficult for an attacker to identify when he found the right key". It should come as no surprise that SSL, designed during this era, has had a long list of design flaws.
Cryptography in the 2010s is different. Now we start with basic components which are believed to be highly secure — e.g., block ciphers which are believed to be indistinguishable from random permutations — and which have been mathematically proven to be secure against certain types of attacks — e.g., AES is known to be immune to differential cryptanalysis. From those components, we then build higher-order systems using mechanisms which have been proven to not introduce vulnerabilities. For example, if you generate an ordered sequence of packets by encrypting data using an indistinguishable-from-random-permutation block cipher (e.g., AES) in CTR mode using a packet sequence number as the CTR nonce, and then append a weakly-unforgeable MAC (e.g., HMAC-SHA256) of the encrypted data and the packet sequence number, the packets both preserve privacy and do not permit any undetected tampering (including replays and reordering of packets). Life will become even better once Keccak (aka. SHA-3) becomes more widely reviewed and trusted, as its "sponge" construction can be used to construct — with provable security — a very wide range of important cryptographic components.
Cryptography in the 1990s was like trying to build a bridge: You spend a lot of time worrying about making sure that your bridge will still stand even if some of the welds aren't done perfectly, some of the bolts rust, periodic loading results in metal fatigue, et cetera. Theory may say that a particular design will work, but you know that practice never quite matches the theory, so you build in margins of safety, making your structure more costly and more complex as a result. Pure engineering.
Modern cryptography is different; rather than building a bridge, it is like planning a gravity-assisted interplanetary trajectory. Sure, it's complex and you have to get all the details right — but once you start moving, the only way you will fail to reach your destination is if the laws of physics (or mathematics) change. Modern cryptography has developed sufficiently that the theory does match the practice — so rather than learning by watching bridges fall down, it's sufficient to learn the theory, and follow one simple rule: Only do what mathematics says you can do. Pure science.
I'm sure that for what Thomas does, having experience implementing attacks against cryptographic code is very useful. After all, he makes is living finding flaws in application security — and most of the cryptography he encounters is likely to be 1990s-style cryptography. But that is an era which is best left in the past; so for developers, I recommend a more modern approach to cryptography — which means studying the theory and designing systems which you can prove are secure.