A year of funded FreeBSD

I've been maintaining FreeBSD on the Amazon EC2 platform ever since I first got it booting in 2010, but in November 2023 I added to my responsibilities the role of FreeBSD release engineering lead — just in time to announce the availability of FreeBSD 14.0, although Glen Barber did all the release engineering work for that release. While I receive a small amount of funding from Antithesis and from my FreeBSD/EC2 Patreon, it rapidly became clear that my release engineering duties were competing with — in fact, out-competing — FreeBSD/EC2 for my available FreeBSD volunteer hours: In addition to my long list of "features to implement" stagnating, I had increasingly been saying "huh that's weird... oh well, no time to investigate that now". In short, by early 2024 I was becoming increasingly concerned that I was not in a position to be a good "owner" of the FreeBSD/EC2 platform.

For several years leading up to this point I had been talking to Amazonians on and off about the possibility of Amazon sponsoring my FreeBSD/EC2 work; rather predictably, most of those conversation ended up with my contacts at Amazon rhyming with "Amazon should definitely sponsor the work you're doing... but I don't have any money available in my budget for this". Finally in April 2024 I found someone with a budget, and after some discussions around timeline, scope, and process, it was determined that Amazon would support me for a year via GitHub Sponsors. I'm not entirely sure if the year in question was June through May or July through June — money had to move within Amazon, from Amazon to GitHub, from GitHub to Stripe, and finally from Stripe into my bank account, so when I received money doesn't necessarily reflect when Amazon intended to give me money — but either way the sponsorship either has come to an end or is coming to an end soon, so I figured now was a good time to write about what I've done.

Amazon was nominally sponsoring me for 40 hours/month of work on FreeBSD release engineering and FreeBSD/EC2 development — I made it clear to them that sponsoring one and not the other wasn't really feasible, especially given dependencies between the two — and asked me to track how much time I was spending on things. In the end, I spent roughly 50 hours/month on this work, averaging 20 hours/month spent on EC2-specific issues, 20 hours/month making FreeBSD releases happen, and 10 hours/month on other release engineering related work — although the exact breakdown varied dramatically from month to month.

Following FreeBSD's quarterly release schedule (which I announced in July 2024, but put together and presented at the FreeBSD developer summit at BSDCan in May 2024), I managed four FreeBSD releases during the past year: FreeBSD 13.4, in September 2024; FreeBSD 14.2, in December 2024; FreeBSD 13.5, in March 2025; and FreeBSD 14.3, currently scheduled for release on June 10th. The work involved in managing each of these releases — nagging developers to get their code into the tree in time, approving (or disapproving!) merge requests, coordinating with other teams, building and testing images (usually three Betas, one Release Candidate, and the final Release), writing announcement text, and fixing any release-building breakage which arose along the way — mostly happened in the month prior to the release (I refer to the second month of each calendar quarter as "Beta Month") and ranged from a low of 33.5 hours (for FreeBSD 13.5) to a high of 79 hours (for FreeBSD 14.2). As one might imagine, the later in a stable branch you get, the fewer the number of things there are breaking and the lower the amount of work required for doing a release; while I wasn't tracking hours when I managed FreeBSD 14.1, I suspect it took close to 100 hours of release engineering time, and FreeBSD 15.0 is very likely to be well over that.

On the FreeBSD/EC2 side of things, there were two major features which Amazon encouraged me to prioritize: The "power driver" for AWS Graviton instances (aka "how the EC2 API tells the OS to shut down" — without this, FreeBSD ignores the shutdown signal and a few minutes later EC2 times out and yanks the virtual power cable), and device hotplug on AWS Graviton instances. The first of these was straightforward: On Graviton systems, the "power button" is a GPIO pin, the details of which are specified via an ACPI _AEI object. I added code to find those in ACPI and pass the appropriate configuration through to the driver for the PL061 GPIO controller; when the GPIO pin is asserted, the controller generates an interrupt which causes the ACPI "power button" event to be triggered, which in turn now shuts down the system. There was one minor hiccup: The ACPI tables provided by EC2 specify that the GPIO pin in question should be configured as a "Pull Up" pin, but the PL061 controller in fact doesn't have any pullup/pulldown resistors; this didn't cause problems on Linux because Linux silently ignores GPIO configuration failures, but on FreeBSD we disabled the device after failing to configure it. I believe this EC2 bug will be fixed in future Graviton systems; but in the mean time I ship FreeBSD/EC2 AMIs with a new "quirk": ACPI_Q_AEI_NOPULL, aka "Ignore the PullUp flag on GPIO pin specifications in _AEI objects".

Getting hotplug working — or more specifically, getting hot unplug working, since that's where most of the problems arose — took considerably more work, largely because there were several different problems, each presenting on a subset of EC2 instance types:

In addition to these functionality fixes, I made one "quality of life" improvement to FreeBSD's hotplug handling: PCIe (used in the latest generation of EC2 instances) mandates that after the "attention" button is pressed to request a device eject, there is a 5 second delay — in case a human standing in front of a machine says "oh no that's the wrong disk" — and if the button is pressed a second time, the eject request is cancelled. This delay is entirely pointless in EC2, where there is no human physically pressing a button (and no mechanism for pressing a virtual button a second time) — so I added a boot loader tunable to adjust that timeout and set it to zero in EC2. Finally, I put together a test script: I can now launch an EC2 instance and repeatedly plug and unplug an EBS volume via the EC2 API, and confirm that FreeBSD sucessfully attaches and detaches it 300 times in a row. With luck, this will allow me to ensure that hotplug is fully operational on future EC2 instance types — at least, assuming I get access to them before they launch.

While those two were Amazon's top priorities for FreeBSD/EC2 work, they were by no means the only things I worked on; in fact they only took up about half of the time I spent on EC2-specific issues. I did a lot of work in 2021 and 2022 to speed up the FreeBSD boot process, but among the "that's weird but I don't have time to investigate right now" issues I had noticed in late 2023 and early 2024 was that FreeBSD/EC2 instances sometimes took a surprisingly long time to boot. I hadn't measured how long they took, mind you; but as part of the FreeBSD weekly snapshot process I ran test boots of a few EC2 instance types, and I had needed to increase the sleep time between launching instances and trying to SSH into them.

Well, the first thing to do with any sort of performance issues is to collect data; so I benchmarked boot time on weekly EC2 AMI builds dating back to 2018 — spinning up over ten thousand EC2 instances in the process — and started generating FreeBSD boot performance plots. Collecting new data and updating those plots is now part of my weekly snapshot testing process; but even without drawing plots, I could immediately see some issues. I got to work:

One thing which had long been on my "features to implement" list for FreeBSD/EC2 but I hadn't found time for earlier was adding more AMI "flavours": A year ago, we had base (the FreeBSD base system, with minimal additional code installed from the ports tree to make it "act like an EC2 AMI") and cloud-init (as the name suggests, FreeBSD with Cloud-init installed). I added two more flavours of FreeBSD AMI to the roster: small AMIs, which are like base except without debug symbols, the LLDB debugger, 32-bit libraries, FreeBSD tests, or the Amazon SSM Agent or AWS CLI — which collectively reduces the disk space usage from ~5 GB to ~1 GB while not removing anything which most people will use — and builder AMIs, which are FreeBSD AMI Builder AMIs, providing an easy path for users to create customized FreeBSD AMIs.

Of course, with 4 flavours of FreeBSD AMIs — and two filesystems (UFS and ZFS), two architectures (amd64 and arm64), and three versions of FreeBSD (13-STABLE, 14-STABLE, and 15-CURRENT) — all of the weekly snapshot builds were starting to add up; so in May I finally got around to cleaning up old images (and their associated EBS snapshots). While I don't pay for these images — the FreeBSD release engineering AWS account is sponsored by Amazon — it was still costing someone money; so when I realized I could get rid of 336 TB of EBS snapshots, I figured it was worth spending a few hours writing shell scripts.

While most of my time was spent on managing release cycles and maintaining the FreeBSD/EC2 platform, I did also spend some time on broader release engineering issues — in fact, part of the design of the "quarterly" release schedule is that it leaves a few weeks between finishing one release and starting the next to allow for release engineering work which can't effectively be done in the middle of a release cycle. The first issue I tackled here was parallelizing release building: With a large number of EC2 AMIs being built, a large proportion of the release build time was being spent not building but rather installing FreeBSD into VM images. I reworked the release code to parallelize this, but found that it caused sporadic build failures — which were very hard to isolate, since they only showed up with a complete release build (which took close to 24 hours) and not with any subset of the build. After many hours of work I finally tracked the problem down to a single missing Makefile line: We weren't specifying that a directory should be created before files were installed into it. With that fix, I was able to reduce the release build from ~22 hours down to ~13 hours, and also "unlock" the ability to add more EC2 AMI flavours (which I couldn't do earlier since it would have increased the build time too much).

Another general release engineering issue I started tackling was the problem of build reproducibility — aided by the fact that I had EC2 to draw upon. As part of my weekly testing of snapshot images, I now spin up EC2 instances and have them build their own AMIs — and then use diffoscope to compare the disk images they built against the ones they were launched from. This has already found several issues — including some which appeared partway through the year and were identified quickly thanks to the regular testing — of which I've fixed a few and some others I've passed on to other developers to tackle.

Of course, in addition to the big projects there's also a plethora of smaller issues to tackle. Build breakage (weekly snapshot builds are good at finding this!); reviewing patches to the ENA driver; helping Dave Cottlehuber add support for building OCI Containers and uploading them to repositories; teaching my bsdec2-image-upload tool to gracefully handle internal AWS errors; reporting an AWS security issue I stumbled across... some days everything falls under the umbrella of "other stuff which needs to get done", but a lot of it is just as important as the larger projects.

So what's next? Well, I'm still the FreeBSD release engineering lead and the maintainer of the FreeBSD/EC2 platform — just with rather less time to devote to this work. FreeBSD releases will continue to happen — 15.0 should land in December, followed in 2026 by 14.4, 15.1, 14.5, and 15.2 — but I probably won't have time to jump in and fix things as much, so late-landing features are more likely to get removed from a release rather than fixed in time for the release; we were only able to ship OCI Containers starting in FreeBSD 14.2 because I had funded hours to make sure all the pieces landed intact, and that sort of involvement won't be possible. On the EC2 side, now that I have regression testing of boot performance set up, I'll probably catch any issues which need to be fixed there; but the rest of my "features to implement" list — automatically growing filesystems when EBS volumes expand, better automatic configuration with multiple network interfaces (and network interface hot plug), rolling "pre-patched" AMIs (right now FreeBSD instances update themselves when they first boot), putting together a website to help users generate EC2 user-data files (e.g., for installing packages and launching daemons), returning to my work on FreeBSD/Firecracker and making it a supported FreeBSD platform, etc. — is likely to stagnate unless I find more time.

I've been incredibly lucky to get this sponsorship from Amazon; it's far more than most open source developers ever get. I wish it wasn't ending; but I'm proud of the work I've done and I'll always be grateful to Amazon for giving me this opportunity.

Posted at 2025-06-06 19:30 | Permanent link | Comments
blog comments powered by Disqus

Recent posts

Monthly Archives

Yearly Archives


RSS