Announcing the FreeBSD/Firecracker platform
The Firecracker Virtual Machine Monitor was developed at Amazon Web Services as a building block for services like AWS Lambda and AWS Fargate. While there are many ways of launching and managing VMs, Firecracker distinguishes itself with its focus on minimalism — important both for security (fewer devices means less attack surface) and reducing the startup time, which is very important if you're launching VMs on demand in response to incoming HTTP requests. When Firecracker was first released, the only OS which it supported was Linux; six months later, Waldek Kozaczuk ported the OSv unikernel to run on Firecracker. As of a few minutes ago, there are three options: FreeBSD can now run in Firecracker.I started working on this on June 20th mainly out of curiosity: I had heard that Firecracker had PVH boot support (which was in fact mistaken!) and I knew that FreeBSD could boot in PVH mode from Xen, so I wondered just how hard it would be to get FreeBSD up and running. Not impossible, as it turned out, but a bit more work than I was hoping for.
I had a lot of help from other FreeBSD developers, and I'd like to thank in particular Bryan, Ed, Jessica, John, Kyle, Mark, Roger, and Warner for explaining code to me, helping review my patches, and even writing entirely new code which I needed. Among the changes which went into getting the FreeBSD/Firecracker platform working:
- The PVH boot mechanism uses an ELF Note to tell the loader where the PVH kernel entry point is located; FreeBSD was using an SHT_NOTE while Firecracker (or rather, linux-loader) was looking only for PT_NOTEs. Once I tracked down the problem this was fixed quite quickly by Ed and Roger.
- When PVH booting, the loader provides the requested images (kernel, and potentially kernel modules and ramdisk), and also a "start into" structure with metadata needed for the boot process. In Xen, the kernel and modules are loaded into memory first and the start info structure is placed immediately after them; Firecracker places the start info page first and loads the rest later. Very early in the boot process, FreeBSD needs a page of temporary space — and it was using the page immediately after the start info page. Mark and Roger reworked the FreeBSD PVH boot code to use the first page after all of the data provided by the PVH loader — not overwriting important data makes a difference.
- Firecracker doesn't provide ACPI, instead providing information about CPUs and interrupt controllers via the MPTable interface defined in the historical Intel MultiProcessor Specification. Support for this isn't included in the FreeBSD GENERIC kernel — no matter, I was going to provide a customized "FIRECRACKER" kernel configuration anyway — but Firecracker's implementation had two bugs: It placed the MPTable in the wrong place (above the advertised top of system memory rather than in the last kB) and it set a field containing the number of table entries to zero rather than the appropriate count. In both cases, Linux accepts the broken behaviour; so I added a "bug for bug compatibility" option to the FreeBSD MPTable code.
- Upon entering userland, the FreeBSD serial console died after printing 16 characters. This bug I recognized, since I ran into it with EC2: The UART is losing an interrupt on the transmission FIFO. Fortunately the FreeBSD kernel still had a workaround in place and setting hw.broken_txfifo="1" fixed that problem.
- The serial console also couldn't read input — in fact, Firecracker wouldn't read input, and any keypresses stayed in the terminal buffer until after Firecracker exited. This turned out to be due to a bug — or perhaps I should say missing feature — in Firecracker's UART emulation: Firecracker doesn't emulate the FCR (FIFO Control Register), which FreeBSD uses to flush the FIFO. I added code to check if flushing the FIFO via the FCR succeeded, and if not switched to the (slower) approach of reading bytes and discarding them. (Why do we need to flush the FIFO? When the UART is first attached, we write data into it to see how large the buffers are, and then throw away the dummy data.)
- Firecracker uses Virtio to present virtual devices to guest operating systems; no problem, FreeBSD has Virtio support. Except... FreeBSD discovers Virtio devices via ACPI, which doesn't exist on Firecracker. Instead, Firecracker exposes device parameters (memory-mapped I/O address and interrupt number) via the kernel command line. This took quite a bit of plumbing to handle — not least of which because FreeBSD interprets the kernel command line as environment variables, and the Virtio MMIO specification calls for devices to be exposed as a series of virtio_mmio.device=... arguments — i.e., with the same "variable name" for each of them. The FreeBSD kernel now handles such duplicate environment variables by appending suffixes, so that we end up with virtio_mmio.device, virtio_mmio.device_1, virtio_mmio.device_2, et cetera, and the Virtio driver looks for those environment variables to create device instances.
- Most Virtio hosts handle disk I/Os consisting of multiple segments of data; QEMU for example handles 128 segments. Firecracker is more minimalist: It rejects I/Os with more than one segment. This causes problems for FreeBSD with unaligned I/Os from userland, since a buffer which is contiguous in virtual address space might span non-contiguous pages in physical address space. I modified FreeBSD's virtio block device driver to make use of the busdma system, which "bounces" (aka. copies via a buffer) data as needed to comply with alignment (and other) requirements. Now when a Virtio block device only supports single-segment I/Os, if we get an unaligned request we bounce the data.
How to try FreeBSD/Firecracker
To try FreeBSD on Firecracker, you'll need to build a FreeBSD amd64 FIRECRACKER kernel, and build Firecracker with my patches:-
To build the FreeBSD kernel (on a FreeBSD system):
# git clone https://git.freebsd.org/src.git /usr/src # cd /usr/src && make buildkernel TARGET=amd64 KERNCONF=FIRECRACKER
and the built kernel will be found in /usr/obj/usr/src/amd64.amd64/sys/FIRECRACKER/kernel. -
To build Firecracker with PVH boot support (on a Linux system):
# git clone -b pvh-v3 https://github.com/cperciva/firecracker.git
and follow the instructions in the getting started documentation to build from source
Have fun!
FreeBSD on the Graviton 3
Amazon announced the Graviton 3 processor and C7g instance family in November 2021, but it took six months before they were ready for general availability; in the mean time, however, as the maintainer of the FreeBSD/EC2 platform I was able to get early access to these instances.As far as FreeBSD is concerned, Graviton 3 is mostly just a faster version of the Graviton 2: Most things "just work", and the things which don't work on Graviton 2 — hotplug devices and cleanly shutting down an instance via the EC2 API — also don't work on Graviton 3. (Want to help get these fixed? Sponsor my work on FreeBSD/EC2 so I have a few more paid hours to work on this.) The one notable architectural difference with the Graviton 3 is the addition of Pointer Authentication, which makes use of "unused" bits in pointers to guard against some vulnerabilities (e.g. buffer overflows which overwrite pointers). Andrew Turner recently added support for arm64 pointer authentication to FreeBSD.
But since Graviton 3 is largely a "faster Graviton 2", the obvious question is "how much faster" — so I launched a couple instances (c6g.8xlarge and c7g.8xlarge) with 500 GB root disks and started comparing.
The first performance test I always run on FreeBSD is a quick microbenchmark
of hashing performance: The md5 command (also known as
sha1, sha256, sha512, and many other things) has
a "time trial" mode which hashes 100000 blocks of 10000 bytes each. I ran
a few of these hashes:
Graviton 2 | Graviton 3 | speedup | |
md5 | 2.26 s | 2.16 s | 1.05x |
sha1 | 2.67 s | 1.94 s | 1.38x |
sha256 | 0.82 s | 0.81 s | 1.01x |
sha512 | 2.87 s | 1.03 s | 2.79x |
The first two of these hashes (md5 and sha1) are implemented in FreeBSD as pure C code; here we see Graviton 3 pulling slightly ahead. The sha256 and sha512 hashes make use of the arm64 cryptographic extensions (which have special instructions for those operations) so it's no surprise that sha256 has identical performance on both CPUs; for sha512 however it seems that Graviton 3 has far more optimized implementations of the arm64 extensions, since it beats Graviton 2 by almost a factor of 3.
Moving on, the next thing I did was to get a copy of the FreeBSD src and
ports trees. Three commands: First, install git using the
pkg utility; second, git clone the FreeBSD src tree;
and third, use portsnap to get the latest ports tree (this last
one is largely a benchmark of fork(2) performance since
portsnap is a shell script):
Graviton 2 | Graviton 3 | speedup | ||||
real | CPU | real | CPU | real | CPU | |
pkg install git | 19.13 s | 4.76 s | 18.14 s | 3.40 s | 1.05x | 1.40x |
git clone src | 137.76 s | 315.79 s | 120.99 s | 240.09 s | 1.14x | 1.32x |
portsnap fetch extract | 159.56 s | 175.22 s | 124.41 s | 133.02 s | 1.28x | 1.32x |
These commands are all fetching data from FreeBSD mirrors and extracting files to disk, so we should expect that changing the CPU alone would yield limited improvements; and indeed that's exactly what we see in the "real" (wall-clock) time. The pkg command only drops from 19.13 to 18.14 seconds — only a 1.05x speedup — because most of the time pkg is running the CPU is idling anyway. The speedup in CPU time, in contrast, is a factor of 1.40x. Similarly, the git clone and portsnap commands spend some of their time waiting for network or disk; but their CPU time usage drops by a factor of 1.32x.
Well, now that I had a FreeBSD source tree cloned, I had to run the
most classic FreeBSD benchmark: Rebuilding the FreeBSD base system (world
and kernel). I checked out the 13.1-RELEASE source tree (normally I would
test building HEAD, but for benchmarking purposes I wanted to make sure
that other people would be able to run exactly the same compile later) and
timed make buildworld buildkernel -j32 (the -j32 tells
the FreeBSD build to make use of all 32 cores on these systems):
Graviton 2 | Graviton 3 | speedup | ||||
real | CPU | real | CPU | real | CPU | |
FreeBSD world+kernel | 849.09 s | 21892.79 s | 597.14 s | 14112.62 s | 1.42x | 1.45x |
Here we see the Graviton 3 really starting to shine: While there's some disk I/O to slow things down, the entire compile fits into the disk cache (the src tree is under 1 GB and the obj tree is around 5 GB, while the instances I was testing on have 64 GB of RAM), so almost all of the time spent is on actual compiling. (Or waiting for compiles to finish! While we run with -j32, the FreeBSD build is not perfectly parallelized, and on average only 26 cores are being used at once.) The FreeBSD base system build completes on the Graviton 3 in 9 minutes and 57 seconds, compared to 14 minutes and 9 seconds on the Graviton 2 — a 1.42x speedup (and 1.45x reduction in CPU time).
Ok, that's the base FreeBSD system; what about third-party packages? I
built the apache24, Xorg, and libreoffice
packages (including all of their dependencies, starting from a clean
system each time). In the interest of benchmarking the package builds
rather than the mirrors holding source code, I ran a make fetch-recursive
for each of the packages (downloading source code for the package and
all of its dependencies) first and only timed the builds.
Graviton 2 | Graviton 3 | speedup | ||||
real | CPU | real | CPU | real | CPU | |
apache24 | 502.77 s | 1103.50 s | 369.34 s | 778.52 s | 1.36x | 1.42x |
Xorg | 3270.62 s | 41005.32 s | 2492.98 s | 28649.06 s | 1.31x | 1.43x |
libreoffice | 10084.95 s | 106502.80 s | 7306.28 s | 74385.83 s | 1.38x | 1.43x |
Here again we see a large reduction in CPU time — by a factor of 1.42 or 1.43 — from the Graviton 3, although as usual the "real" time shows somewhat less improvement; even with the source code already downloaded, a nontrivial amount of time is spent extracting tarballs.
All told, the Graviton 3 is a very nice improvement over the Graviton 2: With the exception of sha256 — which, at 1.2 GB/s, is likely more than fast enough already — we consistently see a CPU speedup of between 30% and 45%. I look forward to moving some of my workloads across to Graviton 3 based instances!
FreeBSD/EC2: What I've been up to
I realized recently that there's very little awareness of the work which goes into keeping FreeBSD working on Amazon EC2 — for that matter, I often have trouble remembering what I've been fixing. As an experiment I'm going to start trying to record my work, both for public consumption and to help myself; I might end up posting monthly, but to start with I'm going to report on what I've been done in January through March of 2022.
January 2022
- I committed code I started working on 4.5 years earlier which speeds up the x86 boot process (including EC2) by roughly 2 seconds.
- Working with a few other FreeBSD developers, I helped to fix qemu breakage which was preventing EC2 arm64 images from building.
- I reported benchmarking results to Amazon which helped them fix a performance issue in their EFI boot code.
- I kicked the Lightsail team about updating their FreeBSD images.
February 2022
- I continued kicking the Lightsail team about updating their FreeBSD images.
- I handed out AWS credit codes (from my "AWS Hero" quota) to FreeBSD developers.
- I liaised with an Amazon developer working on fixing hotplug in arm64. (Work not ready for commit yet.)
- I committed a patch (not done by me, although I helped to review it) for obtaining entropy from EFI in the boot loader and passing it to the kernel; this ensures that arm64 EC2 instances have enough entropy for key generation when they first boot.
- I helped to debug more breakage affecting the release engineering AMI builds.
- I updated my EC2 boot scripts to fix the formatting of the SSH host keys, which had been broken by changes in the logger utility.
March 2022
- Lightsail finally updated to FreeBSD 12.3. I encouraged them to add a FreeBSD 13 offering as well.
- I investigated a bug report concerning encrypted EBS volumes; it seems to be as AWS bug and I convinced Amazonians to investigate.
- I closed a bug report concerning clock stability on T3 family instances; it resulted from an AWS bug which I have been told has now been fixed.
- I fixed a glitch in the release engineering build process which was resulting in 13.1 BETA AMIs not being registered in the Systems Manager Parameter Store.
- I wrote a patch to fix the console on EC2 arm64 instances; currently pending review.
This work is supported by my FreeBSD/EC2 Patreon.
FreeBSD/EC2 AMI Systems Manager Public Parameters
In June, I posted a EC2 Wishlist with three entries: "AWS Systems Manager Public Parameters", "BootMode=polyglot", and "Attaching multiple IAM Roles to an EC2 instance". I am happy to say that my first wish has been granted!The necessary flags were recently set within AWS, and a few days ago I added code to FreeBSD's build system to register 14.0-CURRENT AMI Ids as Public Parameters. (I'll be merging this code to 13-STABLE and 12-STABLE in the coming weeks.) I've also "backfilled" the parameters for releases from 12.0 onwards.
This means that you can now
$ aws --region us-east-1 ssm get-parameter --name /aws/service/freebsd/arm64/base/ufs/13.0/RELEASE | jq -r '.Parameter.Value' ami-050cc11ac34def94b(using the jq tool to extract the Value field from the JSON blog returned by the AWS CLI) to look up the arm64 AMI for 13.0-RELEASE, and also
$ aws ec2 run-instances --image-id resolve:ssm:/aws/service/freebsd/arm64/base/ufs/13.0/RELEASE ... more command line options here ...to look up the AMI and launch an instance — no more grepping the release announcement emails to find the right AMI Id for your region! Assuming everything works as expected, this will also be very useful for anyone who wants to run the latest STABLE or CURRENT images, since every time a new weekly snapshot is published the Public Parameter will be updated.
Many thanks to David and Arthur at AWS for their assistance in liaising with the Systems Manager team — I wouldn't have been able to do this without them!
This work was supported by my FreeBSD/EC2 Patreon; if you find it useful, please consider contributing so that I have more "funded hours" to spend on FreeBSD/EC2 work.
EC2 boot time benchmarking
Last week I quietly released ec2-boot-bench, a tool for benchmarking EC2 instance boot times. This tool is BSD licensed, and should compile and run on any POSIX system with OpenSSL or LibreSSL installed. Usage is simple — give it AWS keys and tell it what to benchmark:usage: ec2-boot-bench --keys <keyfile> --region <name> --ami <AMI Id> --itype <instance type> [--subnet <subnet Id>] [--user-data <file>]and it outputs four values — how long the RunInstances API call took, how long it took EC2 to get the instance from "pending" state to "running" state, how long it took once the instance was "running" before port TCP/22 was "closed" (aka. sending a SYN packet got a RST back), and how long it took from when TCP/22 was "closed" to when it was "open" (aka. sending a SYN got a SYN/ACK back):
RunInstances API call took: 1.543152 s Moving from pending to running took: 4.904754 s Moving from running to port closed took: 17.175601 s Moving from port closed to port open took: 5.643463 s
Once I finished writing ec2-boot-bench, the natural next step was to run some tests — in particular, to see how FreeBSD compared to other operating systems used in EC2. I used the c5.xlarge instance type and tested FreeBSD releases since 11.1-RELEASE (the first FreeBSD release which can run on the c5.xlarge instance type) along with a range of Linux AMIs mostly taken from the "quick launch" menu in the AWS console. In order to perform an apples-to-apples comparison, I passed a user-data file to the FreeBSD instances which turned off some "firstboot" behaviour — by default, FreeBSD release AMIs will update themselves and reboot to ensure they have all necessary security fixes before they are used, while Linuxes just leave security updates for users to install later:
>>/etc/rc.conf firstboot_freebsd_update_enable="NO" firstboot_pkgs_enable="NO"
For each of the AMIs I tested, I ran ec2-boot-bench 10 times, discarded the first result, and took the median values from the remaining 9 runs. The first two values — the time taken for a RunInstances API call to successfully return, and the time taken after RunInstances returns before a DescribeInstances call says that the instance is "running" — are consistent across all the AMIs I tested, at roughly 1.5 and 6.9 seconds respectively; so the numbers we need to look at for comparing AMIs are just the last two values reported by ec2-boot-bench, namely the time before the TCP/IP stack is running and has an IP address, and the time between that point and when sshd is running.
The results of my testing are as follows:
AMI Id (us-east-1) | AMI Name | running to port closed | closed to open | total |
ami-0f9ebbb6ab174bc24 | Clear Linux 34640 | 1.23 | 0.00 | 1.23 |
ami-07d02ee1eeb0c996c | Debian 10 | 6.26 | 4.09 | 10.35 |
ami-0c2b8ca1dad447f8a | Amazon Linux 2 | 9.55 | 1.54 | 11.09 |
ami-09e67e426f25ce0d7 | Ubuntu Server 20.04 LTS | 7.39 | 4.65 | 12.04 |
ami-0747bdcabd34c712a | Ubuntu Server 18.04 LTS | 10.64 | 4.30 | 14.94 |
ami-03a454637e4aa453d | Red Hat Enterprise Linux 8 (20210825) | 13.16 | 2.11 | 15.27 |
ami-0ee02acd56a52998e | Ubuntu Server 16.04 LTS | 12.76 | 5.42 | 18.18 |
ami-0a16c2295ef80ff63 | SUSE Linux Enterprise Server 12 SP5 | 16.32 | 6.96 | 23.28 |
ami-00be86d9bba30a7b3 | FreeBSD 12.2-RELEASE | 17.09 | 6.22 | 23.31 |
ami-00e91cb82b335d15f | FreeBSD 13.0-RELEASE | 19.00 | 5.13 | 24.13 |
ami-0fde50fcbcd46f2f7 | SUSE Linux Enterprise Server 15 SP2 | 18.13 | 6.76 | 24.89 |
ami-03b0f822e17669866 | FreeBSD 12.0-RELEASE | 19.82 | 5.83 | 25.65 |
ami-0de268ac2498ba33d | FreeBSD 12.1-RELEASE | 19.93 | 6.09 | 26.02 |
ami-0b96e8856151afb3a | FreeBSD 11.3-RELEASE | 22.61 | 5.05 | 27.66 |
ami-70504266 | FreeBSD 11.1-RELEASE | 25.72 | 4.39 | 30.11 |
ami-e83e6c97 | FreeBSD 11.2-RELEASE | 25.45 | 5.36 | 30.81 |
ami-01599ad2c214322ae | FreeBSD 11.4-RELEASE | 55.19 | 4.02 | 59.21 |
ami-0b0af3577fe5e3532 | Red Hat Enterprise Linux 8 | 13.43 | 52.31 | 65.74 |
In the race to accept incoming SSH connections, the clear winner — no pun intended — is Intel's Clear Linux, which boots to a running sshd in a blistering 1.23 seconds after the instance enters the "running" state. After Clear Linux is a roughly three way tie between Amazon Linux, Debian, and Ubuntu — and it's good to see that Ubuntu's boot performance has improved over the years, dropping from 18 seconds in 16.04 LTS to 15 seconds in 18.04 LTS and then to 12 seconds with 20.04 LTS. After the Amazon Linux / Debian / Ubuntu cluster comes SUSE Linux and FreeBSD; here, interestingly, SUSE 12 is faster than SUSE 15, while FreeBSD 12.2 and 13.0 (the most recent two releases) are noticeably faster than older FreeBSD.
Finally in dead last place comes Red Hat — which brings up its network stack quickly but takes a very long time before it is running sshd. It's possible that Red Hat is doing something similar to the behaviour I disabled in FreeBSD, in downloading and installing security updates before exposing sshd to the network — I don't know enough to comment here. (If someone reading this can confirm that possibility and has a way to disable that behaviour via user-data, I'll be happy to re-run the test and revise this post.)
UPDATE: Turns out that Red Hat's terrible performance was due to a bug which was fixed in the 2021-08-25 update. I tested the new version and it now lands in the middle of the pack of Linuxes rather than lagging far behind.
Needless to say, FreeBSD has some work to do to catch up here; but measurement is the first step, and indeed I already have work in progress to further profile and improve FreeBSD's boot performance, which I'll write about in a future post.
If you find this useful, please consider supporting my work either via my FreeBSD/EC2 Patreon or by sending me contributions directly. While my work on the FreeBSD/EC2 platform originated from the needs of my Tarsnap online backup service, it has become a much larger project over the years and I would be far more comfortable spending time on this if it weren't taking away so directly from my "paid work".