EC2 boot time benchmarking

Last week I quietly released ec2-boot-bench, a tool for benchmarking EC2 instance boot times. This tool is BSD licensed, and should compile and run on any POSIX system with OpenSSL or LibreSSL installed. Usage is simple — give it AWS keys and tell it what to benchmark:
usage: ec2-boot-bench --keys <keyfile> --region <name> --ami <AMI Id>
    --itype <instance type> [--subnet <subnet Id>] [--user-data <file>]
and it outputs four values — how long the RunInstances API call took, how long it took EC2 to get the instance from "pending" state to "running" state, how long it took once the instance was "running" before port TCP/22 was "closed" (aka. sending a SYN packet got a RST back), and how long it took from when TCP/22 was "closed" to when it was "open" (aka. sending a SYN got a SYN/ACK back):
RunInstances API call took: 1.543152 s
Moving from pending to running took: 4.904754 s
Moving from running to port closed took: 17.175601 s
Moving from port closed to port open took: 5.643463 s

Once I finished writing ec2-boot-bench, the natural next step was to run some tests — in particular, to see how FreeBSD compared to other operating systems used in EC2. I used the c5.xlarge instance type and tested FreeBSD releases since 11.1-RELEASE (the first FreeBSD release which can run on the c5.xlarge instance type) along with a range of Linux AMIs mostly taken from the "quick launch" menu in the AWS console. In order to perform an apples-to-apples comparison, I passed a user-data file to the FreeBSD instances which turned off some "firstboot" behaviour — by default, FreeBSD release AMIs will update themselves and reboot to ensure they have all necessary security fixes before they are used, while Linuxes just leave security updates for users to install later:


For each of the AMIs I tested, I ran ec2-boot-bench 10 times, discarded the first result, and took the median values from the remaining 9 runs. The first two values — the time taken for a RunInstances API call to successfully return, and the time taken after RunInstances returns before a DescribeInstances call says that the instance is "running" — are consistent across all the AMIs I tested, at roughly 1.5 and 6.9 seconds respectively; so the numbers we need to look at for comparing AMIs are just the last two values reported by ec2-boot-bench, namely the time before the TCP/IP stack is running and has an IP address, and the time between that point and when sshd is running.

The results of my testing are as follows:
AMI Id (us-east-1) AMI Name running to port closed closed to open total
ami-0f9ebbb6ab174bc24Clear Linux 346401.230.001.23
ami-07d02ee1eeb0c996cDebian 106.264.0910.35
ami-0c2b8ca1dad447f8aAmazon Linux 29.551.5411.09
ami-09e67e426f25ce0d7Ubuntu Server 20.04 LTS7.394.6512.04
ami-0747bdcabd34c712aUbuntu Server 18.04 LTS10.644.3014.94
ami-03a454637e4aa453dRed Hat Enterprise Linux 8 (20210825)13.162.1115.27
ami-0ee02acd56a52998eUbuntu Server 16.04 LTS12.765.4218.18
ami-0a16c2295ef80ff63SUSE Linux Enterprise Server 12 SP516.326.9623.28
ami-00be86d9bba30a7b3FreeBSD 12.2-RELEASE17.096.2223.31
ami-00e91cb82b335d15fFreeBSD 13.0-RELEASE19.005.1324.13
ami-0fde50fcbcd46f2f7SUSE Linux Enterprise Server 15 SP218.136.7624.89
ami-03b0f822e17669866FreeBSD 12.0-RELEASE19.825.8325.65
ami-0de268ac2498ba33dFreeBSD 12.1-RELEASE19.936.0926.02
ami-0b96e8856151afb3aFreeBSD 11.3-RELEASE22.615.0527.66
ami-70504266FreeBSD 11.1-RELEASE25.724.3930.11
ami-e83e6c97FreeBSD 11.2-RELEASE25.455.3630.81
ami-01599ad2c214322aeFreeBSD 11.4-RELEASE55.194.0259.21
ami-0b0af3577fe5e3532Red Hat Enterprise Linux 813.4352.3165.74

In the race to accept incoming SSH connections, the clear winner — no pun intended — is Intel's Clear Linux, which boots to a running sshd in a blistering 1.23 seconds after the instance enters the "running" state. After Clear Linux is a roughly three way tie between Amazon Linux, Debian, and Ubuntu — and it's good to see that Ubuntu's boot performance has improved over the years, dropping from 18 seconds in 16.04 LTS to 15 seconds in 18.04 LTS and then to 12 seconds with 20.04 LTS. After the Amazon Linux / Debian / Ubuntu cluster comes SUSE Linux and FreeBSD; here, interestingly, SUSE 12 is faster than SUSE 15, while FreeBSD 12.2 and 13.0 (the most recent two releases) are noticeably faster than older FreeBSD.

Finally in dead last place comes Red Hat — which brings up its network stack quickly but takes a very long time before it is running sshd. It's possible that Red Hat is doing something similar to the behaviour I disabled in FreeBSD, in downloading and installing security updates before exposing sshd to the network — I don't know enough to comment here. (If someone reading this can confirm that possibility and has a way to disable that behaviour via user-data, I'll be happy to re-run the test and revise this post.)

UPDATE: Turns out that Red Hat's terrible performance was due to a bug which was fixed in the 2021-08-25 update. I tested the new version and it now lands in the middle of the pack of Linuxes rather than lagging far behind.

Needless to say, FreeBSD has some work to do to catch up here; but measurement is the first step, and indeed I already have work in progress to further profile and improve FreeBSD's boot performance, which I'll write about in a future post.

If you find this useful, please consider supporting my work either via my FreeBSD/EC2 Patreon or by sending me contributions directly. While my work on the FreeBSD/EC2 platform originated from the needs of my Tarsnap online backup service, it has become a much larger project over the years and I would be far more comfortable spending time on this if it weren't taking away so directly from my "paid work".

Posted at 2021-08-12 04:15 | Permanent link | Comments
blog comments powered by Disqus

Recent posts

Monthly Archives

Yearly Archives