Speeding up Portsnap via geolocation

Ever since I wrote Portsnap in 2004, I've been hearing the same question from FreeBSD users outside of North America: "Can we have some nearby mirrors? The US mirrors are too slow!" I've helped many companies set up Portsnap mirrors within their networks, but most of the public Portsnap mirrors have always been in the US, for three simple reasons: The existing mirrors were more than enough to handle the load (even when it peaks following a "megacommit" which touches many ports); administering multiple similar-but-not-quite-identically-configured mirrors is a headache (and when it comes to system administration, I optimize for lack of headaches); and despite my attempt to improve mirror selection by providing "us.portsnap.freebsd.org", "eu.portsnap.freebsd.org", and other similarly "local" names with customized mirror lists, most people still used the default "portsnap.freebsd.org" server pool. Thanks to Amazon EC2 and Amazon Route 53, this has now improved.

After some preliminary testing, the first step in this process was to take the existing portsnap.freebsd.org DNS entries, enter them into Route 53 — using the web console, since there was no point scripting a one-time setup like this — and add a DNS delegation for the subdomain into the freebsd.org zone. This probably had no user-visible impact: Although the Route 53 servers are spread around the world and thus often faster to access than the FreeBSD.org DNS, most systems would need to fetch FreeBSD.org DNS first anyway.

Having confirmed that everything was working properly with the pre-existing DNS data being served out of Route 53, I launched four new Portsnap mirrors, in the EU West (Ireland), South America East (Sao Paulo), Asia-Pacific North-East (Tokyo), and Asia-Pacific South-East (Singapore) regions. For these, I used my FreeBSD 9.0-RELEASE EC2 images — not an officially supported operating system on EC2, but it works just fine — with lighttpd as the web server, for ease of setup more than anything else, since serving up small static files (which is all the Portsnap HTTP server needs to do) is not at all taxing. These went into Route 53 as ec2-REGION.portsnap.freebsd.org. (I didn't add any mirrors in the three US EC2 regions, since Portsnap already had two very well connected US mirrors, but I might do this in the future.)

Currently all four EC2 mirrors are running on 64-bit "small" instances; based on the amount of traffic currently being served I may upgrade the EU West mirror to "medium" and I might downgrade the Singapore and Sao Paulo mirrors to "micro", but I'm going to wait until I see how the mirrors respond to a "sweeping commit" load spike before I make any such adjustments. Fortunately with EC2 it's possible to shut down an instance and then boot it up on a different size of virtual machine — a much faster process than if I had to start from a fresh install and run through the entire setup process!

The final step was to set up Route 53 to use latency based routing to point the Portsnap client code at the nearest mirror. Here I got a bit annoyed: I had A records pointing at each mirror, and I wanted to create lists of SRV records with different mirror priority orderings — e.g., for users in Japan, the EC2 mirror in Tokyo is the best one to use, but if it isn't responding to requests, the Singapore mirror should be tried next, then the US mirrors, etc. — but I found that Route 53 didn't support latency based SRV records — only latency based A, AAAA, CNAME, and TXT records. After I sent out an irritated tweet about this, I was contacted by a Route 53 engineer who provided me with some background details about this, and I'm hoping that they'll enable latency based SRV records at some point in the future.

I did manage to work around this limitation however: I published a single SRV list specifying "geodns-1", "geodns-2", and "geodns-3" as the priority 1, priority 2, and priority 3 endpoints, then added latency based A records which mapped these to the appropriate endpoints. This is far from ideal — in addition to the error-prone nature of entering each IP address multiple times and the annoying need to configure mirrors to respond to multiple Host: names rather than just their proper names, it means that Portsnap users only see "Fetching from geodns-1.portsnap.freebsd.org" rather than being able to see which mirror they're actually hitting — but it's enough to feed people the bits they need for now.

One possibility I explored was to use Amazon CloudFront for distributing the Portsnap bits; since it's all static files served over HTTP, it seemed that it would be a good fit. Reality proved otherwise, for two reasons: First, Portsnap is optimized to minimize bandwidth consumption in part by fetching lots of small patches. Like Amazon S3, CloudFront has a per-GET cost which is usually minimal, but for Portsnap's abnormally small average object size the per-request fees would have dwarfed the bandwidth fees — making running my own HTTP server inside an EC2 instance the cheaper option. Second, Portsnap relies on HTTP/1.1 pipelining to minimize the effects of network round trip time, and in my tests CloudFront sent HTTP/1.0 responses (I even tried forcing request pipelining on despite the response indicating a lack of pipelining support, but CloudFront clearly didn't like this idea — it responded to that with TCP RST packets). With CloudFront's 30 locations compared to EC2's 7 regions (8 if you count GovCloud), it could have been ideal for minimizing latencies; but even with only EC2's endpoint selection, Portsnap is still much better off than when users around the world were hitting US mirrors.

If you're outside of North America and stopped using Portsnap because you found that it was too slow, try it again! You may be pleasantly surprised.

Posted at 2012-05-20 05:10 | Permanent link | Comments
blog comments powered by Disqus

Recent posts

Monthly Archives

Yearly Archives


RSS