FreeBSD on the Graviton 3

Amazon announced the Graviton 3 processor and C7g instance family in November 2021, but it took six months before they were ready for general availability; in the mean time, however, as the maintainer of the FreeBSD/EC2 platform I was able to get early access to these instances.

As far as FreeBSD is concerned, Graviton 3 is mostly just a faster version of the Graviton 2: Most things "just work", and the things which don't work on Graviton 2 — hotplug devices and cleanly shutting down an instance via the EC2 API — also don't work on Graviton 3. (Want to help get these fixed? Sponsor my work on FreeBSD/EC2 so I have a few more paid hours to work on this.) The one notable architectural difference with the Graviton 3 is the addition of Pointer Authentication, which makes use of "unused" bits in pointers to guard against some vulnerabilities (e.g. buffer overflows which overwrite pointers). Andrew Turner recently added support for arm64 pointer authentication to FreeBSD.

But since Graviton 3 is largely a "faster Graviton 2", the obvious question is "how much faster" — so I launched a couple instances (c6g.8xlarge and c7g.8xlarge) with 500 GB root disks and started comparing.

The first performance test I always run on FreeBSD is a quick microbenchmark of hashing performance: The md5 command (also known as sha1, sha256, sha512, and many other things) has a "time trial" mode which hashes 100000 blocks of 10000 bytes each. I ran a few of these hashes:
Graviton 2Graviton 3speedup
md52.26 s2.16 s1.05x
sha12.67 s1.94 s1.38x
sha2560.82 s0.81 s1.01x
sha5122.87 s1.03 s2.79x

The first two of these hashes (md5 and sha1) are implemented in FreeBSD as pure C code; here we see Graviton 3 pulling slightly ahead. The sha256 and sha512 hashes make use of the arm64 cryptographic extensions (which have special instructions for those operations) so it's no surprise that sha256 has identical performance on both CPUs; for sha512 however it seems that Graviton 3 has far more optimized implementations of the arm64 extensions, since it beats Graviton 2 by almost a factor of 3.

Moving on, the next thing I did was to get a copy of the FreeBSD src and ports trees. Three commands: First, install git using the pkg utility; second, git clone the FreeBSD src tree; and third, use portsnap to get the latest ports tree (this last one is largely a benchmark of fork(2) performance since portsnap is a shell script):
Graviton 2Graviton 3speedup
realCPUrealCPUrealCPU
pkg install git19.13 s4.76 s18.14 s3.40 s1.05x1.40x
git clone src137.76 s315.79 s120.99 s240.09 s1.14x1.32x
portsnap fetch extract159.56 s175.22 s124.41 s133.02 s1.28x1.32x

These commands are all fetching data from FreeBSD mirrors and extracting files to disk, so we should expect that changing the CPU alone would yield limited improvements; and indeed that's exactly what we see in the "real" (wall-clock) time. The pkg command only drops from 19.13 to 18.14 seconds — only a 1.05x speedup — because most of the time pkg is running the CPU is idling anyway. The speedup in CPU time, in contrast, is a factor of 1.40x. Similarly, the git clone and portsnap commands spend some of their time waiting for network or disk; but their CPU time usage drops by a factor of 1.32x.

Well, now that I had a FreeBSD source tree cloned, I had to run the most classic FreeBSD benchmark: Rebuilding the FreeBSD base system (world and kernel). I checked out the 13.1-RELEASE source tree (normally I would test building HEAD, but for benchmarking purposes I wanted to make sure that other people would be able to run exactly the same compile later) and timed make buildworld buildkernel -j32 (the -j32 tells the FreeBSD build to make use of all 32 cores on these systems):
Graviton 2Graviton 3speedup
realCPUrealCPUrealCPU
FreeBSD world+kernel849.09 s21892.79 s597.14 s14112.62 s1.42x1.45x

Here we see the Graviton 3 really starting to shine: While there's some disk I/O to slow things down, the entire compile fits into the disk cache (the src tree is under 1 GB and the obj tree is around 5 GB, while the instances I was testing on have 64 GB of RAM), so almost all of the time spent is on actual compiling. (Or waiting for compiles to finish! While we run with -j32, the FreeBSD build is not perfectly parallelized, and on average only 26 cores are being used at once.) The FreeBSD base system build completes on the Graviton 3 in 9 minutes and 57 seconds, compared to 14 minutes and 9 seconds on the Graviton 2 — a 1.42x speedup (and 1.45x reduction in CPU time).

Ok, that's the base FreeBSD system; what about third-party packages? I built the apache24, Xorg, and libreoffice packages (including all of their dependencies, starting from a clean system each time). In the interest of benchmarking the package builds rather than the mirrors holding source code, I ran a make fetch-recursive for each of the packages (downloading source code for the package and all of its dependencies) first and only timed the builds.
Graviton 2Graviton 3speedup
realCPUrealCPUrealCPU
apache24502.77 s1103.50 s369.34 s778.52 s1.36x1.42x
Xorg3270.62 s41005.32 s2492.98 s28649.06 s1.31x1.43x
libreoffice10084.95 s106502.80 s7306.28 s74385.83 s1.38x1.43x

Here again we see a large reduction in CPU time — by a factor of 1.42 or 1.43 — from the Graviton 3, although as usual the "real" time shows somewhat less improvement; even with the source code already downloaded, a nontrivial amount of time is spent extracting tarballs.

All told, the Graviton 3 is a very nice improvement over the Graviton 2: With the exception of sha256 — which, at 1.2 GB/s, is likely more than fast enough already — we consistently see a CPU speedup of between 30% and 45%. I look forward to moving some of my workloads across to Graviton 3 based instances!

Posted at 2022-05-23 21:35 | Permanent link | Comments
blog comments powered by Disqus

Recent posts

Monthly Archives

Yearly Archives


RSS