Why Tarsnap won't use DynamoDBWhen I heard last Wednesday that Amazon was launching DynamoDB I was immediately excited. The "hard" server-side work for my Tarsnap online backup service consists mostly of two really big key-value maps, and I've spent most of the past two years trying to make those faster and more scalable. Having a key-value datastore service would make my life much simpler, I thought. Unfortunately, upon reading into the details, I've decided that DynamoDB — at least in its present form — isn't something I want to use for Tarsnap.
As I've blogged about before, the Tarsnap server code synthesizes a log-structured filesystem on top of Amazon S3. As a result of this design, for each block of data stored on Tarsnap, the server code needs to keep track of two key-value pairs: First, to map a 64-bit machine number and a 256-bit block ID to a 64-bit log entry number and a 32-bit block length; and second, to map the 64-bit log entry number into a 64-bit S3 object ID and an offset within that block (Tarsnap aggregates multiple blocks into each S3 object in order to amortize the S3 PUT cost). The first of these key-value pairs is 53 bytes long and stays fixed until a block is deleted; the second key-value pair is 24 bytes long and needs to be updated every time the log cleaner reads the block from S3 and writes it back in a new position.
Time for some numbers. The average block size as seen on the Tarsnap server is 33 kB; the Tarsnap client code generates blocks of 64 kB on average, but it deflates blocks by roughly a factor of two on average (obviously some data doesn't compress at all, while other data gets compressed far more). DynamoDB costs $1/GB per month for storage, including a 100 byte overhead per key-value pair; and $0.01/hour for quanta of 10 writes/second or 50 reads/second. I want to perform a complete cleaning run every 14 days in order to avoid spending too much money storing deleted blocks of data; and due to inconsistent loads, I want to have reserved capacity at least double my average throughput.
For each TB of data stored, this gives me 30,000,000 blocks requiring 60,000,000 key-value pairs; these occupy 2.31 GB, but for DynamoDB pricing purposes, they count as 8.31 GB, or $8.31 per month. That's about 2.7% of Tarsnap's gross revenues (30 cents per GB per month); significant, but manageable. However, each of those 30,000,000 blocks need to go through log cleaning every 14 days, a process which requires a read (to check that the block hasn't been marked as deleted) and a write (to update the map to point at the new location in S3). That's an average rate of 25 reads and 25 writes per second, so I'd need to reserve 50 reads and 50 writes per second of DynamoDB capacity. The reads cost $0.01 per hour while the writes cost $0.05 per hour, for a total cost of $0.06 per hour — or $44 per month. That's 14.6% of Tarsnap's gross revenues; together with the storage cost, DynamoDB would eat up 17.3% of Tarsnap's revenue — slightly over $0.05 from every $0.30/GB I take in.
Of course, if that was the only cost Tarsnap had, I'd be overjoyed; but remember, this is just for managing metadata. I'd still have the cost of S3 storage for the data itself, and I'd still have to pay for EC2 instances. Would an extra 5 cents per GB make Tarsnap unprofitable? No; but it would definitely hurt the bottom line. Instead of using DynamoDB, I'm continuing with the project I started a couple of years ago: My Kivaloo data store.
Do I think Kivaloo is better than DynamoDB? Not for most people. But there's important differences which make Kivaloo more suitable for me. For a start, DynamoDB is fast — very fast. It's built on SSDs and delivers "single-digit millisecond" read and write latencies. But backups don't need low latencies; if you're backing up several GB of data you're hardly going to notice a few milliseconds. Kivaloo targets high throughput instead — which is to say, lower cost for a given throughput. Even more importantly, Tarsnap has a peculiar access pattern: Almost all the writes occur in sequential regions. DynamoDB, because it aims for consistent general-purpose performance, needs to be good at handling the worst-case scenario of random accesses; Kivaloo, in contrast, uses a B+Tree which is ideal for handling bulk ordered inserts and updates.
In the end, it comes down to the numbers: On an EC2 c1.medium instance (costing $0.17/hour) Kivaloo can perform 130,000 inserts per second and 60,000 updates per second with Tarsnap's access pattern. On DynamoDB, that would cost $60-$130/hour. True, Kivaloo isn't replicated yet, and when I add that functionality it will increase the cost and decrease the throughput; but even then it will have a price/performance advantage of two orders of magnitude over DynamoDB.
I think DynamoDB is a great service, and I'd encourage everybody to explore its possible uses. But as far as Tarsnap is concerned, DynamoDB may be the world's best hammer, but what I really need is a screwdriver.
Playing chicken with cat.jpgIn a game of chicken, which is the better strategy: Writing a lengthy and detailed "persistence policy" guaranteeing that you'll persist in your course and will not, under any circumstances, swerve to avoid your opponent; or ostentatiously removing your steering wheel and throwing it out the window? As noted by innumerable game theorists over the past fifty years, the latter strategy is the only one which is useful: Humans can't be — and aren't — trusted to follow their stated intentions.
People drive in excess of the posted speed limits. People enter intersections on yellow lights even when they could have safely stopped. People make illegal copies of music, movies, and software. People click checkboxes labelled "I have read and agree to the terms and conditions". People "bend the rules" on (i.e., violate) non-disclosure agreements. Men (and women, in some states) cheat on their wives. And all around the world, people treat privacy policies — and every other sort of policy — more as setting out lists of things to not get caught doing than as rules which must be followed.
37signals bemoans the fact that "trust is fragile"; as far as I'm concerned, they're missing the point. The answer isn't for 37signals to prove that they can be trusted; the answer is to ensure that their customers don't need to trust them. In Tarsnap I might take this to an extreme — in addition to the aforementioned encryption, I encourage users to read the tarsnap source code rather than trusting that I got everything right (even to the point of offering bug bounties) — but even if 37signals doesn't want to offer cryptographically secure storage, they could at least remove the temptation to look at file names in log files by not writing sensitive information to log files in the first place.
FreeBSD now on all EC2 instance typesSix months ago I announced here that I had managed to get FreeBSD running on 64-bit Amazon EC2 instances by defenestrating Windows AMIs. That took the set of EC2 instance types FreeBSD could run on from three (t1.micro and c[cg]1.4xlarge) up to nine by adding all of the large and extra-large instance types; but FreeBSD still couldn't boot on "high-CPU medium" instances or on the "standard small" instance type — the one which got EC2 started, and which I suspect is still the most popular of all the options. Today I am pleased to announce that FreeBSD 9.0-RELEASE AMIs are now available for all EC2 instance types.
I tried building FreeBSD AMIs for the 32-bit EC2 instance types the same way as I did the 64-bit instances — by defenestrating Windows AMIs — but there was a catch: They wouldn't boot. In fact, when I tried this six months ago, they not only didn't boot, but they didn't even produce any console output to help me figure out what was going wrong. A few days ago I tried again and found that while FreeBSD still wasn't booting, it was now at least producing console output (I'm guessing Amazon changed something, but I couldn't say what) and I was able to see where the boot process was failing: It was at the point when FreeBSD launched its paravirtualized drivers.
Disabling the paravirtualized drivers and running FreeBSD in "pure" HVM (aka. a "GENERIC" kernel) got it booting, but wasn't a useful solution: The EC2 network is only available as a paravirtual device. Talking to other FreeBSD developers, I confirmed that the non-functionality of PV drivers under i386/HVM was a known issue, but nobody had managed to track it down yet. I started digging — a process which involved building FreeBSD kernels on a 64-bit EC2 instance and installing them onto an Elastic Block Store volume which I moved back and forth between that instance and a 32-bit instance; starting and stopping the 32-bit instance to trigger an a boot attempt; and reading the output of my debugging printfs from the EC2 console.
And it turned out that the bug was embarrassingly trivial. When the HVM framework for paravirtualized drivers was ported from 64 bits to 32 bits, a definition needed to be added for the __ffs function; we naively assumed that FreeBSD's ffs function would do the trick. Sadly no; while both functions have the same basic functionality (finding the first set bit in an integer) they have one critical difference: ffs counts from one, while __ffs counts from zero. Fix one line, and FreeBSD could boot under HVM with paravirtualized drivers enabled. Run through my Windows-AMI-defenestrating scripts, and I had a FreeBSD AMI which worked on 32-bit instances. From there it was all straightforward. Some minor reorganization of my patches; the final AMI build; and the slow process of copying the AMI I built in the US-East region out to the six other regions — that last step fortunately being made considerably less painful by the scripts I wrote yesterday for loading host keys into .ssh/known_hosts based on fingerprints printed to the EC2 console.
What's next for FreeBSD/EC2? Well, the technical issues have been resolved, and FreeBSD is available everywhere; but there's still a few non-technical issues to handle. On FreeBSD's side, I need to merge my patches into the main tree, and we need to build an EC2-compatible kernel (aka. the XENHVM kernel configuration) as part of the release process. On Amazon's side, I'm hoping that at some point they'll eliminate the 'Windows tax' by providing a mechanism for running in HVM mode without being labelled as a "Windows" instance; and I'd love to see the FreeBSD logo showing up in the EC2 Management Console instead of the Windows logo.
But those are all minor problems. The hard work is done, and for now — after five years of trying — I'm going to enjoy having an EC2 small instance run my operating system of choice.
Automatically populating .ssh/known_hostsOne of the more irritating things about working with virtual machines is SSH host keys. Launch a new virtual machine. Get a new host key generated. Try to SSH in. Get a pesky warning message telling you that the authenticity of the host can't be established. Find the host key fingerprint in the virtual machine's console logs. Eyeball the two 32-character hexadecimal strings. Type "yes" and hope that they really were the same and not just mostly the same. Of course, if you don't care about security you could arrange for all your virtual machines to use the same host key, or use the -o StrictHostKeyChecking=no option; but as the FreeBSD Security Officer and the author of a secure online backup service neither of those are acceptable as far as I'm concerned.
My work on FreeBSD AMIs for EC2 has made me even more sensitive to the irritation of host key checking, since building a set of AMIs for the 7 EC2 regions involves launching and SSHing into no less than 20 virtual machines. A couple of weeks ago I asked twitter for advice about this; ten people replied, and two people — Daniel Shahaf and Markus Friedl — made the critical observation that I wanted to use two tools: ssh-keyscan, to get a host key in a form suitable for the known_hosts file; and ssh-keygen -lf to take the host key from that form and convert it into a fingerprint I could compare against a known good value.
At that point I got busy with other things (most notably final preparations for the FreeBSD 9.0-RELEASE announcement) but on Sunday evening I sat down and wrote a much-needed shell script:
The ssh-knownhost script uses ssh-keyscan to download all the host keys for the specified hostname; uses ssh-keygen to compute their fingerprints; compares them to the list of fingerprints provided on the command-line; and adds any new host keys to ~/.ssh/known_hosts. Short, simple, and effective.# ssh-knownhost hostname [fingerprint ...]
Of course, this only works if you know which fingerprints to specify on the command line; for newly launched EC2 instances, they're mixed up in other console output. Enter another script:
The ec2-knownhost uses the fact that EC2 AMIs — standard ones, at least — print their host keys prefixed with ec2: and between lines -----BEGIN SSH HOST KEY FINGERPRINTS----- and -----END SSH HOST KEY FINGERPRINTS-----. A few lines of shell script is all it takes to extract the host key fingerprints and pass them to ssh-knownhost. Again, short, simple, and effective.# ec2-get-console-output INSTANCE | ec2-knownhost hostname
The scripts are available for download, and I'm placing them in the public domain, so please feel free to redistribute, modify, incorporate into other code, et cetera: ssh-knownhost, ec2-knownhost. I've signed their SHA256 hashes using GPG: ssh-knownhost-sigs.asc.