A year of funded FreeBSD

I've been maintaining FreeBSD on the Amazon EC2 platform ever since I first got it booting in 2010, but in November 2023 I added to my responsibilities the role of FreeBSD release engineering lead — just in time to announce the availability of FreeBSD 14.0, although Glen Barber did all the release engineering work for that release. While I receive a small amount of funding from Antithesis and from my FreeBSD/EC2 Patreon, it rapidly became clear that my release engineering duties were competing with — in fact, out-competing — FreeBSD/EC2 for my available FreeBSD volunteer hours: In addition to my long list of "features to implement" stagnating, I had increasingly been saying "huh that's weird... oh well, no time to investigate that now". In short, by early 2024 I was becoming increasingly concerned that I was not in a position to be a good "owner" of the FreeBSD/EC2 platform.

For several years leading up to this point I had been talking to Amazonians on and off about the possibility of Amazon sponsoring my FreeBSD/EC2 work; rather predictably, most of those conversation ended up with my contacts at Amazon rhyming with "Amazon should definitely sponsor the work you're doing... but I don't have any money available in my budget for this". Finally in April 2024 I found someone with a budget, and after some discussions around timeline, scope, and process, it was determined that Amazon would support me for a year via GitHub Sponsors. I'm not entirely sure if the year in question was June through May or July through June — money had to move within Amazon, from Amazon to GitHub, from GitHub to Stripe, and finally from Stripe into my bank account, so when I received money doesn't necessarily reflect when Amazon intended to give me money — but either way the sponsorship either has come to an end or is coming to an end soon, so I figured now was a good time to write about what I've done.

Amazon was nominally sponsoring me for 40 hours/month of work on FreeBSD release engineering and FreeBSD/EC2 development — I made it clear to them that sponsoring one and not the other wasn't really feasible, especially given dependencies between the two — and asked me to track how much time I was spending on things. In the end, I spent roughly 50 hours/month on this work, averaging 20 hours/month spent on EC2-specific issues, 20 hours/month making FreeBSD releases happen, and 10 hours/month on other release engineering related work — although the exact breakdown varied dramatically from month to month.

Following FreeBSD's quarterly release schedule (which I announced in July 2024, but put together and presented at the FreeBSD developer summit at BSDCan in May 2024), I managed four FreeBSD releases during the past year: FreeBSD 13.4, in September 2024; FreeBSD 14.2, in December 2024; FreeBSD 13.5, in March 2025; and FreeBSD 14.3, currently scheduled for release on June 10th. The work involved in managing each of these releases — nagging developers to get their code into the tree in time, approving (or disapproving!) merge requests, coordinating with other teams, building and testing images (usually three Betas, one Release Candidate, and the final Release), writing announcement text, and fixing any release-building breakage which arose along the way — mostly happened in the month prior to the release (I refer to the second month of each calendar quarter as "Beta Month") and ranged from a low of 33.5 hours (for FreeBSD 13.5) to a high of 79 hours (for FreeBSD 14.2). As one might imagine, the later in a stable branch you get, the fewer the number of things there are breaking and the lower the amount of work required for doing a release; while I wasn't tracking hours when I managed FreeBSD 14.1, I suspect it took close to 100 hours of release engineering time, and FreeBSD 15.0 is very likely to be well over that.

On the FreeBSD/EC2 side of things, there were two major features which Amazon encouraged me to prioritize: The "power driver" for AWS Graviton instances (aka "how the EC2 API tells the OS to shut down" — without this, FreeBSD ignores the shutdown signal and a few minutes later EC2 times out and yanks the virtual power cable), and device hotplug on AWS Graviton instances. The first of these was straightforward: On Graviton systems, the "power button" is a GPIO pin, the details of which are specified via an ACPI _AEI object. I added code to find those in ACPI and pass the appropriate configuration through to the driver for the PL061 GPIO controller; when the GPIO pin is asserted, the controller generates an interrupt which causes the ACPI "power button" event to be triggered, which in turn now shuts down the system. There was one minor hiccup: The ACPI tables provided by EC2 specify that the GPIO pin in question should be configured as a "Pull Up" pin, but the PL061 controller in fact doesn't have any pullup/pulldown resistors; this didn't cause problems on Linux because Linux silently ignores GPIO configuration failures, but on FreeBSD we disabled the device after failing to configure it. I believe this EC2 bug will be fixed in future Graviton systems; but in the mean time I ship FreeBSD/EC2 AMIs with a new "quirk": ACPI_Q_AEI_NOPULL, aka "Ignore the PullUp flag on GPIO pin specifications in _AEI objects".

Getting hotplug working — or more specifically, getting hot unplug working, since that's where most of the problems arose — took considerably more work, largely because there were several different problems, each presenting on a subset of EC2 instance types:

In addition to these functionality fixes, I made one "quality of life" improvement to FreeBSD's hotplug handling: PCIe (used in the latest generation of EC2 instances) mandates that after the "attention" button is pressed to request a device eject, there is a 5 second delay — in case a human standing in front of a machine says "oh no that's the wrong disk" — and if the button is pressed a second time, the eject request is cancelled. This delay is entirely pointless in EC2, where there is no human physically pressing a button (and no mechanism for pressing a virtual button a second time) — so I added a boot loader tunable to adjust that timeout and set it to zero in EC2. Finally, I put together a test script: I can now launch an EC2 instance and repeatedly plug and unplug an EBS volume via the EC2 API, and confirm that FreeBSD sucessfully attaches and detaches it 300 times in a row. With luck, this will allow me to ensure that hotplug is fully operational on future EC2 instance types — at least, assuming I get access to them before they launch.

While those two were Amazon's top priorities for FreeBSD/EC2 work, they were by no means the only things I worked on; in fact they only took up about half of the time I spent on EC2-specific issues. I did a lot of work in 2021 and 2022 to speed up the FreeBSD boot process, but among the "that's weird but I don't have time to investigate right now" issues I had noticed in late 2023 and early 2024 was that FreeBSD/EC2 instances sometimes took a surprisingly long time to boot. I hadn't measured how long they took, mind you; but as part of the FreeBSD weekly snapshot process I ran test boots of a few EC2 instance types, and I had needed to increase the sleep time between launching instances and trying to SSH into them.

Well, the first thing to do with any sort of performance issues is to collect data; so I benchmarked boot time on weekly EC2 AMI builds dating back to 2018 — spinning up over ten thousand EC2 instances in the process — and started generating FreeBSD boot performance plots. Collecting new data and updating those plots is now part of my weekly snapshot testing process; but even without drawing plots, I could immediately see some issues. I got to work:

One thing which had long been on my "features to implement" list for FreeBSD/EC2 but I hadn't found time for earlier was adding more AMI "flavours": A year ago, we had base (the FreeBSD base system, with minimal additional code installed from the ports tree to make it "act like an EC2 AMI") and cloud-init (as the name suggests, FreeBSD with Cloud-init installed). I added two more flavours of FreeBSD AMI to the roster: small AMIs, which are like base except without debug symbols, the LLDB debugger, 32-bit libraries, FreeBSD tests, or the Amazon SSM Agent or AWS CLI — which collectively reduces the disk space usage from ~5 GB to ~1 GB while not removing anything which most people will use — and builder AMIs, which are FreeBSD AMI Builder AMIs, providing an easy path for users to create customized FreeBSD AMIs.

Of course, with 4 flavours of FreeBSD AMIs — and two filesystems (UFS and ZFS), two architectures (amd64 and arm64), and three versions of FreeBSD (13-STABLE, 14-STABLE, and 15-CURRENT) — all of the weekly snapshot builds were starting to add up; so in May I finally got around to cleaning up old images (and their associated EBS snapshots). While I don't pay for these images — the FreeBSD release engineering AWS account is sponsored by Amazon — it was still costing someone money; so when I realized I could get rid of 336 TB of EBS snapshots, I figured it was worth spending a few hours writing shell scripts.

While most of my time was spent on managing release cycles and maintaining the FreeBSD/EC2 platform, I did also spend some time on broader release engineering issues — in fact, part of the design of the "quarterly" release schedule is that it leaves a few weeks between finishing one release and starting the next to allow for release engineering work which can't effectively be done in the middle of a release cycle. The first issue I tackled here was parallelizing release building: With a large number of EC2 AMIs being built, a large proportion of the release build time was being spent not building but rather installing FreeBSD into VM images. I reworked the release code to parallelize this, but found that it caused sporadic build failures — which were very hard to isolate, since they only showed up with a complete release build (which took close to 24 hours) and not with any subset of the build. After many hours of work I finally tracked the problem down to a single missing Makefile line: We weren't specifying that a directory should be created before files were installed into it. With that fix, I was able to reduce the release build from ~22 hours down to ~13 hours, and also "unlock" the ability to add more EC2 AMI flavours (which I couldn't do earlier since it would have increased the build time too much).

Another general release engineering issue I started tackling was the problem of build reproducibility — aided by the fact that I had EC2 to draw upon. As part of my weekly testing of snapshot images, I now spin up EC2 instances and have them build their own AMIs — and then use diffoscope to compare the disk images they built against the ones they were launched from. This has already found several issues — including some which appeared partway through the year and were identified quickly thanks to the regular testing — of which I've fixed a few and some others I've passed on to other developers to tackle.

Of course, in addition to the big projects there's also a plethora of smaller issues to tackle. Build breakage (weekly snapshot builds are good at finding this!); reviewing patches to the ENA driver; helping Dave Cottlehuber add support for building OCI Containers and uploading them to repositories; teaching my bsdec2-image-upload tool to gracefully handle internal AWS errors; reporting an AWS security issue I stumbled across... some days everything falls under the umbrella of "other stuff which needs to get done", but a lot of it is just as important as the larger projects.

So what's next? Well, I'm still the FreeBSD release engineering lead and the maintainer of the FreeBSD/EC2 platform — just with rather less time to devote to this work. FreeBSD releases will continue to happen — 15.0 should land in December, followed in 2026 by 14.4, 15.1, 14.5, and 15.2 — but I probably won't have time to jump in and fix things as much, so late-landing features are more likely to get removed from a release rather than fixed in time for the release; we were only able to ship OCI Containers starting in FreeBSD 14.2 because I had funded hours to make sure all the pieces landed intact, and that sort of involvement won't be possible. On the EC2 side, now that I have regression testing of boot performance set up, I'll probably catch any issues which need to be fixed there; but the rest of my "features to implement" list — automatically growing filesystems when EBS volumes expand, better automatic configuration with multiple network interfaces (and network interface hot plug), rolling "pre-patched" AMIs (right now FreeBSD instances update themselves when they first boot), putting together a website to help users generate EC2 user-data files (e.g., for installing packages and launching daemons), returning to my work on FreeBSD/Firecracker and making it a supported FreeBSD platform, etc. — is likely to stagnate unless I find more time.

I've been incredibly lucky to get this sponsorship from Amazon; it's far more than most open source developers ever get. I wish it wasn't ending; but I'm proud of the work I've done and I'll always be grateful to Amazon for giving me this opportunity.

Posted at 2025-06-06 19:30 | Permanent link | Comments

Chunking attacks on Tarsnap (and others)

Ten years ago I wrote that it would require someone smarter than me to extract information from the way that Tarsnap splits data into chunks. Well, I never claimed to be the smartest person in the world! Working with Boris Alexeev and Yan X Zhang, I've just uploaded a paper to the Cryptology ePrint Archive describing a chosen-plaintext attack which would allow someone with access to the Tarsnap server (aka me, Amazon, or the NSA) or potentially someone with sufficient ability to monitor network traffic (e.g. someone watching your wifi transmissions) to extract Tarsnap's chunking parameters. We also present both known and chosen plaintext attacks against BorgBackup, and known plaintext attacks against Restic.

And, of course, because Tarsnap is intended to be Online backups for the truly paranoid, I've released a new version of Tarsnap today (version 1.0.41) which contains mitigations for these attacks, bringing us back to "I can't see any computationally feasible attack"; but I'm also exploring possibilities for making the chunking provably secure.

I'm sure many people reading this right now are asking the same question: Are my secrets safe? To this I have to say "almost certainly yes". The attack we have to leak Tarsnap's chunking parameters is a chosen plaintext attack — you would have to archive data provided to you by the attacker — and the chosen plaintext has a particular signature (large blocks of "small alphabet" data) which would show up on the Tarsnap server (I can't see your data, but I can see block sizes, and this sort of plaintext is highly compressible). Furthermore, even after obtaining Tarsnap's chunking parameters, leaking secret data would be very challenging, requiring an interactive attack which mixes chosen plaintext with your secrets.

Leaking known data (e.g. answering the question "is this machine archiving a copy of the FreeBSD 13.5-RELEASE amd64 dvd1.iso file") is possible given knowledge of the chunking parameters; but this doesn't particularly enhance an attacker's capabilities since an attacker who can perform a chosen plaintext attack (necessary in order to extract Tarsnap's chunking parameters) can already determine if you have a file stored, by prompting you to store it again and using deduplication as an oracle.

In short: Don't worry, but update to the latest version anyway.

Thanks to Boris Alexeev, Yan X Zhang, Kien Tuong Truong, Simon-Philipp Merz, Matteo Scarlata, Felix Gunther and Kenneth G. Paterson for their assistance. It takes a village.

Posted at 2025-03-21 19:00 | Permanent link | Comments

My re:Invent asks

As an AWS Hero I get free admission to the AWS re:Invent conference; while it's rare that I'm interested in many talks — in previous years I've attended "Advanced" talks which didn't say anything which wasn't already in the published documentation — I do find that it provides a very good opportunity to talk to Amazonians.

While I'm sure many of the things I ask for get filed under "Colin is weird", I know sometimes Amazon does pay attention — at least, once I find the right person to talk to. Since I have quite a list this year, and I know some Amazonians (and maybe even non-Amazonians) may be interested, I figured I might as well post them here.

  1. More Amazonian OSS developers at re:Invent. I'm looking forward to meeting some Valkey developers on Wednesday, but I was disappointed that none of the Firecracker developers are in attendance. Amazon has a policy of not having engineers attend re:Invent unless they're giving talks (and I'd love to see this policy changed in general) but it's absolutely essential for Open Source developers to go to conferences; that's how we meet potential contributors. If your open source team doesn't go to conferences, they're not really doing open source, no matter what license you put on the code.
  2. Lower cross-AZ bandwidth pricing. I don't even particularly care about the cost; but being worried about avoiding cross-AZ bandwidth is making people design bad systems. One of the guidelines in Amazon's "well-architected framework" is to deploy the workload to multiple locations and Amazon specificially calls out using a single Availability Zone as a problem — but concerns about cross-AZ bandwidth (even if it turns out that the concerns are unwarranted!) are preventing people from following this guideline.
  3. On-the-rack EBS storage. I don't know how Amazon datacenters are set up, but the latency of disk I/O to "SSD" EBS volumes strongly suggests that they are a significant distance away from EC2 instances which are accessing them. At the other end of the latency scale, some EC2 instance types have SSDs directly attached to the instance hardware, with dramatically better I/O performance — but have low durability (if the instance dies the data is gone) and no elasticity (each instance type has a certain amount of disk attached).
    Having EBS storage available on the same racks as EC2 nodes would provide an intermediate point, allowing lower latency than the speed of light allows for across-the-datacenter I/Os, while allowing some flexibility in the size of volumes. Users would have to accept that "provision me a volume on the same rack as this instance" might return "sorry all the disks on that rack are full"; but at least at instance launch time requests could be satisfied by searching for a rack with sufficient rack-local disk.
  4. CHERI capable instances. This has been a long standing wishlist item for me and I know I'm not going to get it any time soon; but I know Amazon (and other clouds) have Morello boards for research purposes. CHERI has huge advantages for security and whichever cloud pursues this first will be miles ahead of the competition.
  5. Marketplace support for "pending" or "scheduled" releases. When I add new FreeBSD releases to the AWS Marketplace, they first go through an approval process and then get copied out to all the EC2 regions; once that is done, the Marketplace updates the "product" listing with the new version and sends out emails to all the current users telling them about the new version. This often means that Amazon is sending emails announcing new FreeBSD releases a couple days before I send out the official FreeBSD release announcement.
    I don't want to wait and add new versions to the Marketplace later, because the timeline is unpredictable — usually a couple hours but sometimes a day or more — so I'd like to be able to tell the Marketplace about the upcoming FreeBSD release and have them get everything ready but not update the website or send out email until I'm ready to send out our announcement (we usually allow a few days for mirrors and clouds to sync).
In addition to these, I also had a couple requests which I can't write about due to the nature of what I was asking for. I did however have one more request I can write about — not for AWS, but for a re:Invent sponsor. I ran into some people from Zoom and mentioned that every time I join a call from my (FreeBSD) laptop, the Zoom website tries to get me to open a Zoom client, and only offers "open the meeting in your web browser" after failing repeatedly. Since my web browser's user-agent string includes "FreeBSD", they should be able to detect that there is no Zoom client and go straight to opening the meeting in my web browser. To their credit, they immediately understood what I was complaining about, and even offered to look into whether they could port their Linux client to FreeBSD.

I don't know if or when I'm likely to get any of these, but I like to think that I convinced people that what I was asking for was at least somewhat sensible. Maybe between them and other Amazonians who will no doubt read this, I'll get at least a few of the things on my wishlist.

For the sake of transparency: In addition to giving me (and other AWS Heroes) free admission and travel to re:Invent, Amazon is sponsoring my FreeBSD work. About half of what they're paying for is EC2-specific stuff; the other half is FreeBSD release engineering. Without their support, a number of important features would not have landed in FreeBSD 14.2-RELEASE; thank you Amazon.

Posted at 2024-12-04 02:30 | Permanent link | Comments

Generalist AI doesn't scale

There has been a lot of talk about AI recently, and one particular point has received sigificant attention in the tech industry: The cost of training models. According to some insiders — and the market capitalization of NVIDIA — the computing power needed for AI training threatens to upend the entire semiconductor industry. This should not be a surprise: Generalist AI doesn't scale.

Reduced to its essentials, the task of training a size-N model is one of hill-climbing in N-dimensional space. You take O(N) inputs, run them through your model, and after each of them you nudge the model slightly uphill towards the desired responses. You need O(N) inputs because with any less than that the model will overfit — essentially memorizing the specific set of inputs rather than generalizing from them — and for each of these inputs you need to perform O(N) computation since you have N parameters in the model to tune. End result: O(N^2) computation.

Now, there are plenty of other problems in AI — one of the largest is generating enough training data (easy enough for Chess or Go where you can have the AI play games against itself, but for general knowledge you eventually run out of textbooks) — and you can push against scaling laws for a while simply by throwing more money at them; but in the end you can't defeat scaling. You'll end up boiling the oceans.

So what's the solution? Don't do Generalist AI. Instead, we need to switch to using a pool of expert AIs. Instead of a single size-N model, split the model into k parts, each trained on N/k inputs. On sub-model learns all about medicine; another learns all about modern art. You still have N inputs, but since each of them is only used to optimize a set of N/k parameters, your training cost is now O(N^2 / k).

And yes, you lose something by doing this — you probably won't get hallucinations of modern artwork depicting polypeptides. But, as with humans, most queries can be answered by the appropriate specialist; and it's better to have a collection of experts than a generalist which is too expensive to train effectively. (One could even have a "dispatcher" sub-model which knows enough to identify which of the specialists to refer a query to.) And by reducing the training cost, you become able to build a collection of models which is larger — and smarter — than a generalist model could ever be.

Specialization isn't just for insects. It's for AIs too.

Posted at 2024-04-06 15:30 | Permanent link | Comments

Please test: FreeBSD 13.3-RC1

I just announced the availability of FreeBSD 13.3-RC1. This is the first release candidate of FreeBSD 13.3, and if no further issues are reported will be the only release candidate; I would like to start 13.3-RELEASE builds on Friday, with (allowing time for mirrors to update) the release announcement going out on the following Tuesday (March 5th).

This means there's a few days for people to do some last-minute testing and report any problems they find. If you have time to help out with testing, there are two things in particular which I'd like to see get attention:

  1. Wifi, especially the iwlwifi driver. Bjoern Zeeb merged a significant number of changes to the wifi and linuxkpi (which is used by iwlwifi) code between BETA3 and RC1. While these changes were tested extensively, it's still a big chunk of code — more than I would normally have wanted merged so late, but it fixed serious stability issues with iwlwifi so I thought it was worth including anyway. But I'll feel much better about the release if I know people have been testing this code.
  2. The installer. Most people who test FreeBSD BETAs do it by upgrading existing systems — fair enough, you test what you have. But this means that the installer doesn't get nearly as much testing as running FreeBSD systems get. So if you have a spare system laying around, please download an installer image and make sure that you can install FreeBSD 13.3-RC1! In particular, keep an eye out for any "missing" hardware or error messages about drivers being unable to reserve resources; we had a late fix to the way that ACPI devices reserve resources.

This is the first FreeBSD release I've managed since assuming the role of FreeBSD Release Engineering Lead, and the first time I've been "flying solo" (I managed FreeBSD 13.2, but Glen was looking over my shoulder for most of the release process); so it's entirely possible that I've gotten something horribly wrong. If you see anything which looks strange, please don't hesitate to get in touch — either directly (cperciva@FreeBSD.org) or by emailing the release engineering team (re@FreeBSD.org).

It's a privilege to manage the FreeBSD release process, but it also takes a significant amount of time. If you'd like to help me find time to work on FreeBSD, please consider contributing to my Patreon.

Posted at 2024-02-26 22:45 | Permanent link | Comments

Recent posts

Monthly Archives

Yearly Archives


RSS