FreeBSD/EC2 historyA couple years ago Jeff Barr published a blog post with a timeline of EC2 instances. I thought at the time that I should write up a timeline of the FreeBSD/EC2 platform, but I didn't get around to it; but last week, as I prepared to ask for sponsorship for my work I decided that it was time to sit down and collect together the long history of how the platform has evolved and improved over the years.
Normally I don't edit blog posts after publishing them (with the exception of occasional typographical corrections), but I do plan on keeping this post up to date with future developments.
- August 25, 2006: Amazon EC2 launches. It supports a single version of Ubuntu Linux; FreeBSD is not available.
- December 13, 2010: I manage to get FreeBSD running on EC2 t1.micro instances.
- March 22, 2011: I manage to get FreeBSD running on EC2 "cluster compute" instances.
- July 8, 2011: I get FreeBSD 8.2 running on all 64-bit EC2 instance types, by marking it as "Windows" in order to get access to Xen/HVM virtualization. (Unfortunately this meant that users had to pay the higher "Windows" hourly pricing.)
- January 16, 2012: I get FreeBSD 9.0 running on 32-bit EC2 instances via the same "defenestration" trick. (Again, paying the "Windows" prices.)
- August 16, 2012: I move the FreeBSD rc.d scripts which handle "EC2" functionality (e.g., logging SSH host keys to the console) into the FreeBSD ports tree.
- October 7, 2012: I rework the build process for FreeBSD 9.1-RC1 and later to use "world" bits extracted from the release ISOs; only the kernel is custom-built. Also, the default SSH user changes from "root" to "ec2-user".
- October 31, 2012: Amazon launches the "M3" family of instances, which support Xen/HVM without FreeBSD needing to pay the "Windows" tax.
- November 21, 2012: I get FreeBSD added to the AWS Marketplace.
- October 2, 2013: I finish merging kernel patches into the FreeBSD base system, and rework the AMI build (again) so that FreeBSD 10.0-ALPHA4 and later use bits extracted from the release ISOs for the entire system (world + kernel). FreeBSD Update can now be used for updating everything (because now FreeBSD/EC2 uses a GENERIC kernel).
- October 27, 2013: I add code to EC2 images so that FreeBSD 10.0-BETA2 and later AMIs will run FreeBSD Update when they first boot in order to download and install any critical updates.
- December 1, 2013: I add code to EC2 images so that FreeBSD 10.0-BETA4 and later AMIs bootstrap the pkg tool and install packages at boot time (by default, the "awscli" package).
- December 9, 2013: I add configinit to FreeBSD 10.0-RC1 and later to allow systems to be easily configured via EC2 user-data.
- July 1, 2014: Amazon launches the "T2" family of instances; now the most modern family for every type of EC2 instance (regular, high-memory, high-CPU, high-I/O, burstable) supports HVM and there should no longer be any need for FreeBSD users to pay the "Windows tax".
- November 24, 2014: I add code to FreeBSD 10.2 and later to automatically resize their root filesystems when they first boot; this means that a larger root disk can be specified at instance launch time and everything will work as expected.
- April 1, 2015: I integrate the FreeBSD/EC2 build process into the FreeBSD release building process; FreeBSD 10.2-BETA1 and later AMIs are built by the FreeBSD release engineering team.
- January 12, 2016: I enable Intel 82599-based "first generation EC2 Enhanced Networking" in FreeBSD 11.0 and later.
- June 9, 2016: I enable the new EC2 VGA console functionality in FreeBSD 11.0 and later. (The old serial console also continues to work.)
- June 24, 2016: Intel 82599-based Enhanced Networking works reliably in FreeBSD 11.0 and later thanks to discovering and working around a Xen bug.
- June 29, 2016: I improve throughput on Xen blkfront devices (/dev/xbd*) by enabling indirect segment I/Os in FreeBSD 10.4 and later. (I wrote this functionality in July 2015, but left it disabled by default a first because a bug in EC2 caused it to hurt performance on some instances.)
- July 7, 2016: I fix a bug in FreeBSD's virtual memory initialization in order to allow it to support boot with 128 CPUs; aka. FreeBSD 11.0 and later support the EC2 x1.32xlarge instance type.
- January 26, 2017: I change the default configuration in FreeBSD 11.1 and later to support EC2's IPv6 networking setup out of the box (once you flip all of the necessary switches to enable IPv6 in EC2 itself).
- May 20, 2017: In collaboration with Rick Macklem, I make FreeBSD 11.1 and later compatible with the Amazon "Elastic File System" (aka. NFSv4-as-a-service) via the newly added "oneopenown" mount option (and lots of bug fixes).
- May 25, 2017: I enable support for the Amazon "Elastic Network Adapter" in FreeBSD 11.1 and later. (The vast majority of the work — porting the driver code — was done by Semihalf with sponsorship from Amazon.)
- December 5, 2017: I change the default configuration in FreeBSD 11.2 and later to make use of the Amazon Time Sync Service (aka. NTP-as-a-service).
The current statusThe upcoming FreeBSD release (11.2) supports: IPv6, Enhanced Networking (both generations), Amazon Elastic File System, Amazon Time Sync Service, both consoles (Serial + VGA), and every EC2 instance type (although I'm not sure if FreeBSD has drivers to make use of the FPGA or GPU hardware on those instances).
When a FreeBSD/EC2 instance first launches, it uses configinit to perform any desired configuration based on user-data scripts, and then (unless configinit is used to change this) resizes its root filesystem to fit the provided root disk, downloads and installs critical updates, sets up the ec2-user user for SSH access, and prints SSH host keys to the consoles.
If there's something else you think FreeBSD should support or a change you'd like to see to the default configuration, please let me know.
Some thoughts on Spectre and MeltdownBy now I imagine that all of my regular readers, and a large proportion of the rest of the world, have heard of the security issues dubbed "Spectre" and "Meltdown". While there have been some excellent technical explanations of these issues from several sources — I particularly recommend the Project Zero blog post — I have yet to see anyone really put these into a broader perspective; nor have I seen anyone make a serious attempt to explain these at a level suited for a wide audience. While I have not been involved with handling these issues directly, I think it's time for me to step up and provide both a wider context and a more broadly understandable explanation.
The story of these attacks starts in late 2004. I had submitted my doctoral thesis and had a few months before flying back to Oxford for my defense, so I turned to some light reading: Intel's latest "Optimization Manual", full of tips on how to write faster code. (Eking out every last nanosecond of performance has long been an interest of mine.) Here I found an interesting piece of advice: On Intel CPUs with "Hyper-Threading", a common design choice (aligning the top of thread stacks on page boundaries) should be avoided because it would result in some resources being overused and others being underused, with a resulting drop in performance. This started me thinking: If two programs can hurt each others' performance by accident, one should be able to measure whether its performance is being hurt by the other; if it can measure whether its performance is being hurt by people not following Intel's optimization guidelines, it should be able to measure whether its performance is being hurt by other patterns of resource usage; and if it can measure that, it should be able to make deductions about what the other program is doing.
It took me a few days to convince myself that information could be stolen in this manner, but within a few weeks I was able to steal an RSA private key from OpenSSL. Then started the lengthy process of quietly notifying Intel and all the major operating system vendors; and on Friday the 13th of May 2005 I presented my paper describing this new attack at BSDCan 2005 — the first attack of this type exploiting how a running program causes changes to the microarchitectural state of a CPU. Three months later, the team of Osvik, Shamir, and Tromer published their work, which showed how the same problem could be exploited to steal AES keys. (Note that there were side channel attacks discovered over the preceding years which relied on microarchitectural details; but in those cases information was being revealed by the time taken by the cryptographic operation in question. My work was the first to demonstrate that information could leak from a process into the microarchitectural state and then be extracted from there by another process.)
Over the following years there have been many attacks which expoit different aspects of CPU design — exploiting L1 data cache collisions, exploiting L1 code cache collisions, exploiting L2 cache collisions, exploiting the TLB, exploiting branch prediction, etc. — but they have all followed the same basic mechanism: A program does something which interacts with the internal state of a CPU, and either we can measure that internal state (the more common case) or we can set up that internal state before the program runs in a way which makes the program faster or slower. These new attacks use the same basic mechanism, but exploit an entirely new angle. But before I go into details, let me go back to basics for a moment.
Understanding the attacksThese attacks exploit something called a "side channel". What's a side channel? It's when information is revealed as an inadvertant side effect of what you're doing. For example, in the movie 2001, Bowman and Poole enter a pod to ensure that the HAL 9000 computer cannot hear their conversation — but fail to block the optical channel which allows Hal to read their lips. Side channels are related to a concept called "covert channels": Where side channels are about stealing information which was not intended to be conveyed, covert channels are about conveying information which someone is trying to prevent you from sending. The famous case of a Prisoner of War blinking the word "TORTURE" in Morse code is an example of using a covert channel to convey information.
Another example of a side channel — and I'll be elaborating on this example later, so please bear with me if it seems odd — is as follows: I want to know when my girlfriend's passport expires, but she won't show me her passport (she complains that it has a horrible photo) and refuses to tell me the expiry date. I tell her that I'm going to take her to Europe on vacation in August and watch what happens: If she runs out to renew her passport, I know that it will expire before August; while if she doesn't get her passport renewed, I know that it will remain valid beyond that date. Her desire to ensure that her passport would be valid inadvertantly revealed to me some information: Whether its expiry date was before or after August.
Over the past 12 years, people have gotten reasonably good at writing programs which avoid leaking information via side channels; but as the saying goes, if you make something idiot-proof, the world will come up with a better idiot; in this case, the better idiot is newer and faster CPUs. The Spectre and Meltdown attacks make use of something called "speculative execution". This is a mechanism whereby, if a CPU isn't sure what you want it to do next, it will speculatively perform some action. The idea here is that if it guessed right, it will save time later — and if it guessed wrong, it can throw away the work it did and go back to doing what you asked for. As long as it sometimes guesses right, this saves time compared to waiting until it's absolutely certain about what it should be doing next. Unfortunately, as several researchers recently discovered, it can accidentally leak some information during this speculative execution.
Going back to my analogy: I tell my girlfriend that I'm going to take her on vacation in June, but I don't tell her where yet; however, she knows that it will either be somewhere within Canada (for which she doesn't need a passport, since we live in Vancouver) or somewhere in Europe. She knows that it takes time to get a passport renewed, so she checks her passport and (if it was about to expire) gets it renewed just in case I later reveal that I'm going to take her to Europe. If I tell her later that I'm only taking her to Ottawa — well, she didn't need to renew her passport after all, but in the mean time her behaviour has already revealed to me whether her passport was about to expire. This is what Google refers to "variant 1" of the Spectre vulnerability: Even though she didn't need her passport, she made sure it was still valid just in case she was going to need it.
"Variant 2" of the Spectre vulnerability also relies on speculative execution but in a more subtle way. Here, instead of the CPU knowing that there are two possible execution paths and choosing one (or potentially both!) to speculatively execute, the CPU has no idea what code it will need to execute next. However, it has been keeping track and knows what it did the last few times it was in the same position, and it makes a guess — after all, there's no harm in guessing since if it guesses wrong it can just throw away the unneeded work. Continuing our analogy, a "Spectre version 2" attack on my girlfriend would be as follows: I spend a week talking about how Oxford is a wonderful place to visit and I really enjoyed the years I spent there, and then I tell her that I want to take her on vacation. She very reasonably assumes that — since I've been talking about Oxford so much — I must be planning on taking her to England, and runs off to check her passport and potentially renew it... but in fact I tricked her and I'm only planning on taking her to Ottawa.
This "version 2" attack is far more powerful than "version 1" because it can be used to exploit side channels present in many different locations; but it is also much harder to exploit and depends intimately on details of CPU design, since the attacker needs to make the CPU guess the correct (wrong) location to anticipate that it will be visiting next.
Now we get to the third attack, dubbed "Meltdown". This one is a bit weird, so I'm going to start with the analogy here: I tell my girlfriend that I want to take her to the Korean peninsula. She knows that her passport is valid for long enough; but she immediately runs off to check that her North Korean visa hasn't expired. Why does she have a North Korean visa, you ask? Good question. She doesn't — but she runs off to check its expiry date anyway! Because she doesn't have a North Korean visa, she (somehow) checks the expiry date on someone else's North Korean visa, and then (if it is about to expire) runs out to renew it — and so by telling her that I want to take her to Korea for a vacation I find out something she couldn't have told me even if she wanted to. If this sounds like we're falling down a Dodgsonian rabbit hole... well, we are. The most common reaction I've heard from security people about this is "Intel CPUs are doing what???", and it's not by coincidence that one of the names suggested for an early Linux patch was Forcefully Unmap Complete Kernel With Interrupt Trampolines (FUCKWIT). (For the technically-inclined: Intel CPUs continue speculative execution through faults, so the fact that a page of memory cannot be accessed does not prevent it from, well, being accessed.)
How users can protect themselvesSo that's what these vulnerabilities are all about; but what can regular users do to protect themselves? To start with, apply the damn patches. For the next few months there are going to be patches to operating systems; patches to individual applications; patches to phones; patches to routers; patches to smart televisions... if you see a notification saying "there are updates which need to be installed", install the updates. (However, this doesn't mean that you should be stupid: If you get an email saying "click here to update your system", it's probably malware.) These attacks are complicated, and need to be fixed in many ways in many different places, so each individual piece of software may have many patches as the authors work their way through from fixing the most easily exploited vulnerabilities to the more obscure theoretical weaknesses.
What else can you do? Understand the implications of these vulnerabilities. Intel caught some undeserved flak for stating that they believe "these exploits do not have the potential to corrupt, modify or delete data"; in fact, they're quite correct in a direct sense, and this distinction is very relevant. A side channel attack inherently reveals information, but it does not by itself allow someone to take control of a system. (In some cases side channels may make it easier to take advantage of other bugs, however.) As such, it's important to consider what information could be revealed: Even if you're not working on top secret plans for responding to a ballistic missile attack, you've probably accessed password-protected websites (Facebook, Twitter, Gmail, perhaps your online banking...) and possibly entered your credit card details somewhere today. Those passwords and credit card numbers are what you should worry about.
Now, in order for you to be attacked, some code needs to run on your computer. The most likely vector for such an attack is through a website — and the more shady the website the more likely you'll be attacked. (Why? Because if the owners of a website are already doing something which is illegal — say, selling fake prescription drugs — they're far more likely to agree if someone offers to pay them to add some "harmless" extra code to their site.) You're not likely to get attacked by visiting your bank's website; but if you make a practice of visiting the less reputable parts of the World Wide Web, it's probably best to not log in to your bank's website at the same time. Remember, this attack won't allow someone to take over your computer — all they can do is get access to information which is in your computer's memory at the time they carry out the attack.
For greater paranoia, avoid accessing suspicious websites after you handle any sensitive information (including accessing password-protected websites or entering your credit card details). It's possible for this information to linger in your computer's memory even after it isn't needed — it will stay there until it's overwritten, usually because the memory is needed for something else — so if you want to be safe you should reboot your computer in between.
For maximum paranoia: Don't connect to the internet from systems you care about. In the industry we refer to "airgapped" systems; this is a reference back to the days when connecting to a network required wires, so if there was a literal gap with just air between two systems, there was no way they could communicate. These days, with ubiquitous wifi (and in many devices, access to mobile phone networks) the terminology is in need of updating; but if you place devices into "airplane" mode it's unlikely that they'll be at any risk. Mind you, they won't be nearly as useful — there's almost always a tradeoff between security and usability, but if you're handling something really sensitive, you may want to consider this option. (For my Tarsnap online backup service I compile and cryptographically sign the packages on a system which has never been connected to the Internet. Before I turned it on for the first time, I opened up the case and pulled out the wifi card; and I copy files on and off the system on a USB stick. Tarsnap's slogan, by the way, is "Online backups for the truly paranoid".)
How developers can protect everyoneThe patches being developed and distributed by operating systems — including microcode updates from Intel — will help a lot, but there are still steps individual developers can take to reduce the risk of their code being exploited.
First, practice good "cryptographic hygiene": Information which isn't in memory can't be stolen this way. If you have a set of cryptographic keys, load only the keys you need for the operations you will be performing. If you take a password, use it as quickly as possible and then immediately wipe it from memory. This isn't always possible, especially if you're using a high level language which doesn't give you access to low level details of pointers and memory allocation; but there's at least a chance that it will help.
Second, offload sensitive operations — especially cryptographic operations — to other processes. The security community has become more aware of privilege separation over the past two decades; but we need to go further than this, to separation of information — even if two processes need exactly the same operating system permissions, it can be valuable to keep them separate in order to avoid information from one process leaking via a side channel attack against the other.
One common design paradigm I've seen recently is to "TLS all the things", with a wide range of applications gaining understanding of the TLS protocol layer. This is something I've objected to in the past as it results in unnecessary exposure of applications to vulnerabilities in the TLS stacks they use; side channel attacks provide another reason, namely the unnecessary exposure of the TLS stack to side channels in the application. If you want to add TLS to your application, don't add it to the application itself; rather, use a separate process to wrap and unwrap connections with TLS, and have your application take unencrypted connections over a local (unix) socket or a loopback TCP/IP connection.
Separating code into multiple processes isn't always practical, however, for reasons of both performance and practical matters of code design. I've been considering (since long before these issues became public) another form of mitigation: Userland page unmapping. In many cases programs have data structures which are "private" to a small number of source files; for example, a random number generator will have internal state which is only accessed from within a single file (with appropriate functions for inputting entropy and outputting random numbers), and a hash table library would have a data structure which is allocated, modified, accessed, and finally freed only by that library via appropriate accessor functions. If these memory allocations can be corralled into a subset of the system address space, and the pages in question only mapped upon entering those specific routines, it could dramatically reduce the risk of information being revealed as a result of vulnerabilities which — like these side channel attacks — are limited to leaking information but cannot be (directly) used to execute arbitrary code.
Finally, developers need to get better at providing patches: Not just to get patches out promptly, but also to get them into users' hands and to convince users to install them. That last part requires building up trust; as I wrote last year, one of the worst problems facing the industry is the mixing of security and non-security updates. If users are worried that they'll lose features (or gain "features" they don't want), they won't install the updates you recommend; it's essential to give users the option of getting security patches without worrying about whether anything else they rely upon will change.
What's next?So far we've seen three attacks demonstrated: Two variants of Spectre and one form of Meltdown. Get ready to see more over the coming months and years. Off the top of my head, there are four vulnerability classes I expect to see demonstrated before long:
- Attacks on p-code interpreters. Google's "Variant 1" demonstrated an attack where a conditional branch was mispredicted resulting in a bounds check being bypassed; but the same problem could easily occur with mispredicted branches in a switch statement resulting in the wrong operation being performed on a valid address. On p-code machines which have an opcode for "jump to this address, which contains machine code" (not entirely unlikely in the case of bytecode machines which automatically transpile "hot spots" into host machine code), this could very easily be exploited as a "speculatively execute attacker-provided code" mechanism.
- Structure deserializing. This sort of code handles attacker-provided inputs which often include the lengths or numbers of fields in a structure, along with bounds checks to ensure the validity of the serialized structure. This is prime territory for a CPU to speculatively reach past the end of the input provided if it mispredicts the layout of the structure.
- Decompressors, especially in HTTP(S) stacks. Data decompression inherently involves a large number of steps of "look up X in a table to get the length of a symbol, then adjust pointers and perform more memory accesses" — exactly the sort of behaviour which can leak information via cache side channels if a branch mispredict results in X being speculatively looked up in the wrong table. Add attacker-controlled inputs to HTTP stacks and the fact that services speaking HTTP are often required to perform request authentication and/or include TLS stacks, and you have all the conditions needed for sensitive information to be leaked.
- Remote attacks. As far as I'm aware, all of the microarchitectural side channels demonstrated over the past 14 years have made use of "attack code" running on the system in question to observe the state of the caches or other microarchitectural details in order to extract the desired data. This makes attacks far easier, but should not be considered to be a prerequisite! Remote timing attacks are feasible, and I am confident that we will see a demonstration of "innocent" code being used for the task of extracting the microarchitectural state information before long. (Indeed, I think it is very likely that certain people are already making use of such remote microarchitectural side channel attacks.)
Final thoughts on vulnerability disclosureThe way these issues were handled was a mess; frankly, I expected better of Google, I expected better of Intel, and I expected better of the Linux community. When I found that Hyper-Threading was easily exploitable, I spent five months notifying the security community and preparing everyone for my announcement of the vulnerability; but when the embargo ended at midnight UTC and FreeBSD published its advisory a few minutes later, the broader world was taken entirely by surprise. Nobody knew what was coming aside from the people who needed to know; and the people who needed to know had months of warning.
Contrast that with what happened this time around. Google discovered a problem and reported it to Intel, AMD, and ARM on June 1st. Did they then go around contacting all of the operating systems which would need to work on fixes for this? Not even close. FreeBSD was notified the week before Christmas, over six months after the vulnerabilities were discovered. Now, FreeBSD can occasionally respond very quickly to security vulnerabilities, even when they arise at inconvenient times — on November 30th 2009 a vulnerability was reported at 22:12 UTC, and on December 1st I provided a patch at 01:20 UTC, barely over 3 hours later — but that was an extremely simple bug which needed only a few lines of code to fix; the Spectre and Meltdown issues are orders of magnitude more complex.
To make things worse, the Linux community was notified and couldn't keep their mouths shut. Standard practice for multi-vendor advisories like this is that an embargo date is set, and nobody does anything publicly prior to that date. People don't publish advisories; they don't commit patches into their public source code repositories; and they definitely don't engage in arguments on public mailing lists about whether the patches are needed for different CPUs. As a result, despite an embargo date being set for January 9th, by January 4th anyone who cared knew about the issues and there was code being passed around on Twitter for exploiting them.
This is not the first time I've seen people get sloppy with embargoes recently, but it's by far the worst case. As an industry we pride ourselves on the concept of responsible disclosure — ensuring that people are notified in time to prepare fixes before an issue is disclosed publicly — but in this case there was far too much disclosure and nowhere near enough responsibility. We can do better, and I sincerely hope that next time we do.
FreeBSD/EC2 on C5 instancesLast week, Amazon released the "C5" family of EC2 instances, continuing their trend of improving performance by both providing better hardware and reducing the overhead associated with virtualization. Due to the significant changes in this new instance family, Amazon gave me advance notice of their impending arrival several months ago, and starting in August I had access to (early versions of) these instances so that I could test FreeBSD on them. Unfortunately the final launch date took me slightly by surprise — I was expecting it to be later in the month — so there are still a few kinks which need to be worked out for FreeBSD to run smoothly on C5 instances. I strongly recommend that you read the rest of this blog post before you use FreeBSD on EC2 C5 instances. (Or possibly skip to the end if you're not interested in learning about any of the underlying details.)
Ever since the first EC2 instances launched — the ones which were retrospectively named "m1.small" — Amazon has relied on the Xen hypervisor. No longer: C5 instances use KVM. This sounds like it would be a problem, but in fact that change didn't bother FreeBSD at all: Now that everything uses hardware-based paging virtualization, the core of the FreeBSD kernel barely noticed the change. (This would have been a much larger problem if FreeBSD/EC2 images were using Xen paravirtualized paging, but EC2 has provided hardware virtualization in all of their new instance types since October 2012.) As usual, it's the drivers which have caused problems for FreeBSD.
Under Xen, EC2 was able to provide FreeBSD with Xen "paravirtualized" devices: A privileged virtual machine within each physical EC2 host had access to the physical disks and network, and FreeBSD would interact with it via the Xen "netfront/netback" and "blkfront/blkback" drivers. There were a lot of tricks used to eke out every last scrap of performance, but this had an inevitable performance cost: Every network packet or disk I/O would need to be handled not just by the FreeBSD kernel but also by the Linux kernel running in the "Dom0" domain. Starting a few years ago, Amazon offered "Enhanced Networking" where an EC2 instance could talk directly to network adapter hardware — first with Intel 10GbE network interfaces, but later with Amazon's custom-designed "Elastic Network Adapter" hardware; FreeBSD gained support for the ENA network interface in FreeBSD 11.1, thanks to Amazon taking the step of proactively looking for (and paying) someone to port their Linux driver. Until very recently, there was no similar "pass-through" for disks; from the original m1.small in August 2006 until the I3 family arrived in February 2017, disks always showed up as Xen block devices. With the I3 "high I/O" family, ephemeral disks were exposed as directly accessible NVMe devices for the first time — but EBS volumes were still exposed as Xen devices block devices.
I had the first hint that Amazon was going to be doing something interesting when I was asked if FreeBSD would boot if its root disk was NVMe instead of being a Xen block device. As I recall it, my answer was as follows:
"Yeah, it should work just fine; FreeBSD supports NVMe disks, so it will taste the disk, read the GPT labels, and boot from the one marked as rootfs. I never hard-coded the device name of the boot disk anywhere.Well, apparently Amazon has some engineers who are both very brave and extremely talented: EBS volumes show up on EC2 instances as "NVMe" hardware.
But wait, how is this going to work? AMI boot disks are EBS volumes. You can't be copying the disk image to a local NVMe disk before booting; that would take too long, and changes would be orphaned if the node failed. YOU'RE BUILDING A HARDWARE FRONT-END TO EBS? You guys are insane! Even rolling out a software update to EBS must be a nightmare at your scale, and now you want to add the headache of dealing with hardware on top of that?
Of course, EBS volumes aren't NVMe devices — and herein lies the problem. You can attach and detach EBS volumes from a running EC2 instance with a single API call (or a few mouse clicks if you prefer a GUI), and I doubt anyone has ever tested hotplug and hotunplug of physical NVMe disks on a FreeBSD system. Moreover, I'm absolutely certain that nobody has ever tested hotplugging and hotunplugging physical NVMe disks from a Legacy PCI bus which is hanging off an Intel 440FX chipset — which is what a C5 instance looks like! Unsurprisingly, with untested code paths came new and interesting bugs.
The first problem I ran into is that when I attached or detached EBS volumes, nothing happened. FreeBSD isn't expecting new devices to appear and disappear on the PCI bus! It turns out that EC2 is sending an ACPI notification about hotplug events, and when I compiled a FreeBSD kernel with ACPI debugging enabled, it was clear what was happening:
kernel: evmisc-0267 EvQueueNotifyRequest : No notify handler for Notify, ignoring (S1F_, 1) node 0xfffff80007832400FreeBSD wasn't listening for the ACPI hotplug notifications, so they were simply getting dropped on the floor. Fortunately the FreeBSD project includes smarter people than me, and John Baldwin pointed out that we have a tool for this: devctl rescan pci0 prompts FreeBSD to rescan the PCI bus and detect any changes in the hardware. Attaching an EBS volume and running this command makes the new disk promptly appear, exactly as expected.
Unfortunately the detach case doesn't work quite so well. When I removed an EBS volume and ran devctl rescan pci0, rather than FreeBSD removing the disk, I got a kernel panic. It turned out that the FreeBSD NVMe driver was marking its device nodes as "eternal" (which allows for some locking optimizations) and you're not allowed to remove such device nodes. OK, get rid of that flag and recompile and try again and... another kernel panic. Turns out that some (completely untested) teardown code was freeing a structure and then calling a function pointer stored within it; without kernel debugging enabled this might have worked, but as it was, it turned out that calling 0xdeadc0dedeadc0de is not a very good idea. OK, fix the order of the teardown code, recompile, try again... and FreeBSD didn't panic when I instructed it to rescan the PCI bus and detect that the NVMe disk went away. But it didn't remove all of its device nodes either, and as soon as anything touched the orphan device node, I got another kernel panic. Apparently nobody ever got around to finishing the NVMe device removal code.
So the situation is as follows:
- FreeBSD versions prior to FreeBSD 11.1 will not run on C5, because they lack support for the ENA networking hardware — on Xen-based EC2 instances, earlier FreeBSD versions can get virtualized Xen networking, but of course that's not available on C5.
- FreeBSD 11.1 and HEAD (as of mid-November 2017) will boot and run just fine on C5 as long as you never attach or detach EBS volumes.
- If you attach or detach an EBS volume and then reboot, you'll see the devices you expect.
- If you attach an EBS volume and run devctl rescan pci0, you'll see the new volume.
- If you detach an EBS volume and run devctl rescan pci0, you will either get an immediate kernel panic or be left with a device node which causes a kernel panic as soon as it is touched.
- In FreeBSD 11.2 and later, everything should Just Work.
That last bit, which depends on fixing the NVMe driver, is currently being worked on by Warner Losh (not to be confused with Warren Lash, who is a completely unrelated character in Michael Lucas' git commit murder). It also depends on someone figuring out how to catch the ACPI events in question, but that's more of a question of finding the best way rather than finding a way: In the worst case, I could ignore the ACPI events completely and ship 11.2 AMIs with a daemon which runs a bus rescan every few seconds.
Thanks to Matthew Wilson, David Duncan, John Baldwin, and Warner Losh for their help with figuring things out here.
FreeBSD/EC2: Community vs. Marketplace AMIsFreeBSD has been available in Amazon EC2 since FreeBSD 9.0 (January 2012), and from FreeBSD 10.2 (August 2015) AMIs have been built by the FreeBSD release enginering team as part of the regular release process. Release announcements go out with a block of text about where to find FreeBSD in EC2, for example (taken from the FreeBSD 11.1 release announcement):
FreeBSD 11.1-RELEASE amd64 is also available on these cloud hosting platforms: * Amazon(R) EC2(TM): AMIs are available in the following regions: ap-south-1 region: ami-8a760ee5 eu-west-2 region: ami-f2425396 eu-west-1 region: ami-5302ec2a ap-northeast-2 region: ami-f575ab9b ap-northeast-1 region: ami-0a50b66c sa-east-1 region: ami-9ad8acf6 ca-central-1 region: ami-622e9106 ap-southeast-1 region: ami-6d75e50e ap-southeast-2 region: ami-bda2bede eu-central-1 region: ami-7588251a us-east-1 region: ami-70504266 us-east-2 region: ami-0d725268 us-west-1 region: ami-8b0128eb us-west-2 region: ami-dda7bea4 AMIs will also available in the Amazon(R) Marketplace once they have completed third-party specific validation at: https://aws.amazon.com/marketplace/pp/B01LWSWRED/This leads to a question I am frequently asked: Which way should FreeBSD users launch their instances? The answer, as usual, is "it depends". Here are some of the advantages of each option, to help you decide which to use.
"Community" AMIsThese are the AMIs with numbers listed in the release announcement. If you want to launch FreeBSD 11.1-RELEASE in us-west-2, find the line which says 'us-west-2' in the FreeBSD 11.1-RELEASE announcement, and there's the number of the AMI you want to launch. (In this case, ami-dda7bea4.) You can then put that into your aws ec2 run-instances command line; or if you're using the EC2 Management Console you can click on 'Launch Instance', 'Community AMIs', then enter the AMI number into the box labelled 'Search Community AMIs'.
Benfits of using the Community AMIs:
- You get the AMI number from the signed release announcement; there's no risk of clicking on "FreeBSD" and getting someone else's "customized" image.
- The release announcement tells you the AMI number for the region you want to use. You can copy and paste this into your scripts.
- This mechanism has been around for over a decade and everything "just works".
Disadvantages of using the Community AMIs:
- FreeBSD only publishes AMIs in the regions which exist when the FreeBSD release goes out. If you want to launch FreeBSD 11.1 into the upcoming EC2 region in France, you're out of luck. (I occasionally copy AMIs after the fact, especially if someone asks for them; but since the new AMIs can't be retroactively added to the release announcement, there's no way for you to know if a community AMI which claims to be FreeBSD 11.1-RELEASE really is. Don't just assume — there are people publishing AMIs containing malware!)
- Copying AMI numbers around works great for scripts, but can be annoying when it comes to interactive launching of instances.
"Marketplace" AMIsThe AWS Marketplace is designed as a sales channel for products based on AWS, replacing a previous mechanism called "paid AMIs". This aspect of it is entirely irrelevant to FreeBSD — FreeBSD, as the name suggests, is free — but its design as a sales channel gives it some other advantages:
- Rather than needing to look up individual AMI numbers, you can simply point your web browser at the "product" page for FreeBSD 10 or FreeBSD 11. (In each case, you'll get the latest release from the stable branch by default; but you can select older releases if you want.) From there it takes just a few clicks to launch FreeBSD.
- When AWS launches in a new region, the AWS Marketplace will make images available there; you'll be able to launch FreeBSD instances without worrying about whether the AMI you found is the "real" FreeBSD.
- Amazon provides me with statistics about the "sales" of FreeBSD AMIs. I suppose that for some users this type of tracking could count as a disadvantage — but I find it useful to know that yesterday there were 1000 AWS accounts which collectively had a total of over 10,000 FreeBSD instances running.
Using the Marketplace images has some disadvantages, however — again arising out of its nature as a product catalogue built on top of the lower level AWS infrastructure:
- The AWS Marketplace often lags behind EC2 in adding support for new regions and instance types.
- Because adding images to the AWS Marketplace is a manual process, we only do this for FreeBSD releases — not for the weekly snapshots or the beta and release candidate images in the months leading up to the release.
- Due to issues relating to how EC2 tracks "product codes", if you detach the root disk of an instance launched from the AWS Marketplace, you won't be able to attach it to a running EC2 instance — instead, you have to stop the target instance first. This is admittedly a somewhat unusual use case; but it's an essential one for me: During FreeBSD development, I often end up with unbootable instances (usually by installing a broken kernel), and being able to move the disk into a working instance and fix things makes my life much easier.
In general, I'd say that for "regular users" the AWS Marketplace is probably the better option, while for developers the "Community" images — with all the snapshots and pre-release images, and the ability to swap disks around more easily — is likely to be more convenient. Of course, there's no need to lock yourself into just one or the other; be aware of both and use whichever suits your needs the most at any particular time.
Oil changes, safety recalls, and software patchesEvery few months I get an email from my local mechanic reminding me that it's time to get my car's oil changed. I generally ignore these emails; it costs time and money to get this done (I'm sure I could do it myself, but the time it would cost is worth more than the money it would save) and I drive little enough — about 2000 km/year — that I'm not too worried about the consequences of going for a bit longer than nominally advised between oil changes. I do get oil changes done... but typically once every 8-12 months, rather than the recommended 4-6 months. From what I've seen, I don't think I'm alone in taking a somewhat lackadaisical approach to routine oil changes.
On the other hand, there's another type of notification which elicits more prompt attention: Safety recalls. There are two good reasons for this: First, whether for vehicles, food, or other products, the risk of ignoring a safety recall is not merely that the product will break, but rather that the product will be actively unsafe; and second, when there's a safety recall you don't have to pay for the replacement or fix — the cost is covered by the manufacturer.
I started thinking about this distinction — and more specifically the difference in user behaviour — in the aftermath of the "WannaCry" malware. While WannaCry attracted widespread attention for its "ransomware" nature, the more concerning aspect of this incident is how it propagated: By exploiting a vulnerability in SMB for which Microsoft issued patches two months earlier. As someone who works in computer security, I find this horrifying — and I was particularly concerned when I heard that the NHS was postponing surgeries because they couldn't access patient records. Think about it: If the NHS couldn't access patient records due to WannaCry, it suggests WannaCry infiltrated systems used to access patient records — meaning that someone else exploiting the same vulnerabilities could have accessed those records. The SMB subsystem in Windows was not merely broken; until patches were applied, it was actively unsafe.
I imagine that most people in my industry would agree that security patches should be treated in the same vein as safety recalls — unless you're certain that you're not affected, take care of them as a matter of urgency — but it seems that far more users instead treat security patches more like oil changes: something to be taken care of when convenient... or not at all, if not convenient. It's easy to say that such users are wrong; but as an industry it's time that we think about why they are wrong rather than merely blaming them for their problems.
There are a few factors which I think are major contributors to this problem. First, the number of updates: When critical patches occur frequently enough to become routine, alarm fatigue sets in and people cease to give the attention updates deserve, even if on a conscious level they still recognize the importance of applying updates. Easy problem to identify, hard problem to address: We need to start writing code with fewer security vulnerabilities.
Second, there is a long and sad history of patches breaking things. In a few cases this is because something only worked by accident — an example famous in the FreeBSD community is the SA-05:03.amd64 vulnerability, which accidentally made it possible to launch the X server while running as an unprivileged user — but more often it is simply the result of a mistake. While I appreciate that there is often an urgency to releasing patches, and limited personnel (especially for open source software), releasing broken patches is something which it is absolutely vital to avoid — because it doesn't only break systems, but also contributes to a lack of trust in software updates. During my time as FreeBSD Security Officer, regardless of who on the security team was taking responsibility for preparing a patch and writing the advisory, I refused to sign and release advisories until I was convinced that our patch both fixed the problem and didn't accidentally break anything else; in some cases this meant that our advisories went out a few hours later, but in far more cases it ensured that we released one advisory rather than a first advisory followed by a second "whoops, we broke something" follow-up a few days later. My target was always that our track record should be enough that FreeBSD users would be comfortable blindly downloading and installing updates on their production systems, without spending time looking at the code or deploying to test systems first — because some day there will be a security update which they don't have time to look over carefully before installing.
The problems of the large volume of patches and their reputation for breaking things is made worse by the fact that many systems use the same mechanism for distributing both security fixes and other changes — bug fixes and new features. This has become a common pattern largely in the name of user friendliness — why force users to learn two systems when we can do everything through a single update mechanism? — but I worry that it is ultimately counterproductive, in that presenting updates through the same channel tends to conflate them in the minds of users, with the result that critical security updates instead end up being given the lesser attention more appropriately due to a new feature update. Even if the underlying technology used for fetching and installing updates is the same, it may be that exposing different types of updates through different interfaces would result in better user behaviour. My bank sends me special offers in the mail but phones if my credit card usage trips fraud alarms; this is the sort of distinction in intrusiveness we should see for different types of software updates.
Finally, I think there is a problem with the mental model most people have of computer security. Movies portray attackers as geniuses who can break into any system in minutes; journalists routinely warn people that "nobody is safe"; and insurance companies offer insurance against "cyberattacks" in much the same way as they offer insurance against tornados. Faced with this wall of misinformation, it's not surprising that people get confused between 400 pound hackers sitting on beds and actual advanced persistent threats. Yes, if the NSA wants to break into your computer, they can probably do it — but most attackers are not the NSA, just like most burglars are not Ethan Hunt. You lock your front door, not because you think it will protect you from the most determined thieves, but because it's an easy step which dramatically reduces your risk from opportunistic attack; but users don't see applying security updates as the equivalent of locking their front door when they leave home.
Computer security is a mess; there's no denying that. Vendors publishing code with thousands of critical vulnerabilities and government agencies which stockpile these vulnerabilities rather than helping to fix them certainly do nothing to help. But WannaCry could have been completely prevented if users had taken the time to install the fixes provided by Microsoft — if they had seen the updates as being something critical rather than an annoyance to put off until the next convenient weekend.
As a community, it's time for computer security professionals to think about the complete lifecycle of software vulnerabilities. It's not enough for us to find vulnerabilities, figure out how to fix them, and make the updates available; we need to start thinking about the final step of how to ensure that end users actually install the updates we provide. Unless we manage to do that, there will be a lot more crying in the years to come.