The kivaloo data store

Just over a year ago, I sat down to a late breakfast with Patrick Collison to discuss his latest startup. At some point over the next couple of hours, we started talking about my online backup service, Tarsnap, and I mentioned that I was keeping my eye on some server-side scalability issues. "I'm OK for the next year at the current growth rate, but then I'll need to get a more sophisticated data store in place to handle block metadata; right now I'm using a very simple, obviously correct, but rather slow data structure."

"I'm impressed with what the rethinks are doing, but it feels like they're doing too much — my data store needs are very minimal," I continued. "Maybe I should just write my own data store; it can't take more than a few months."

I'm very pleased to finally announce the availability of version 1.0.0 of the kivaloo data store as BSD-licensed open source software.

To be fair, kivaloo has been almost-released for a long time. Five months ago I wrote here that the first release of my data store would "be soon" and asked people to suggest possible names for it (the winner: Tim Fletcher, who suggested "kivalu", pronounced "key value"; I changed the spelling slightly, but the essential idea was his). Shortly after writing that blog post, however, I was distracted by porting FreeBSD to EC2 and then by a critical bug in Tarsnap, so it's only recently that I've had a chance to do the last few bits of work needed before kivaloo could be released.

So what is kivaloo? It is a durable, consistent, high-performance key-value data store built out of a background-garbage-collected log-structured B+Tree. Perhaps it's easier to describe what kivaloo isn't:

In short, kivaloo is designed to be exactly what I need for Tarsnap. This is open source software in the time-honoured tradition of scratching an itch; I hope other people will find kivaloo useful and possibly even contribute back, but even if nobody else ever uses kivaloo, it will make a big difference to the Tarsnap server performance and scalability.

Take a look and let me know what you think.

Posted at 2011-03-28 12:40 | Permanent link | Comments

FreeBSD/EC2 cluster compute

A few months ago, I announced experimental FreeBSD/EC2 support, and for the past four weeks FreeBSD 8.2-RELEASE AMIs have been available on Amazon EC2; but unfortunately these have been limited to "t1.micro" instances. It's impressive how much can be done with a fraction of a CPU and 600 MB of RAM; but sometimes you really need something a bit more powerful. I'm pleased to announce that, thanks to support from SegPub and vtalk, FreeBSD is now available on cc1.4xlarge instances.

For those of you unfamiliar with the wide range of virtual machines available from EC2, perhaps the best way to put it is this: cc1.4xlarge instances are as big as t1.micro instances are small. They have 8 cores of 2.93 GHz Nehalem, 23 GB of RAM, two 840 GB ephemeral disks (plus as many EBS volumes as you want to create, of course), and 10 Gbps network connectivity. The name "cluster compute" suggests one way of using these instances, but Amazon would have been perfectly justified in calling these "do anything you want and still have power to spare" instances.

One of the things I've heard a lot of EC2 users say they want over the past few months is ZFS support. Linux, of course, doesn't support ZFS (userland kludges and license violations notwithstanding); and with Oracle apparently doing its best to kill OpenSolaris, FreeBSD has rapidly become the de facto standard operating system for ZFS. Unfortunately, ZFS wasn't designed for 32-bit systems with 600 MB of RAM, so attempting to run it on t1.micro instances is a very good way to cause a kernel panic; but it works beautifully on cc1.4xlarge instances.

Because cc1.4xlarge instances run "hardware virtualized" rather than "paravirtualized" Xen, they avoid most of the hard work which was needed to get t1.micro instances working. Indeed, bringing FreeBSD to cc1.4xlarge instances only required one significant bugfix (actually a workaround -- there's a bug in the Xen serial port emulation and I had to modify FreeBSD's UART code to be compatible with it) and the rest of the work was packaging and wrangling startup scripts. As a result, I have absolutely no hesitation in saying that FreeBSD is production-ready on cluster compute instances. (8.2-RELEASE on t1.micro is in "use with caution" territory -- so far it seems very stable, but it's too early to be confident about its stability.)

What's next for FreeBSD/EC2? Probably improving FreeBSD 9.0-CURRENT stability on t1.micro instances. Right now there are pmap locking (or rather, lack-of-locking) bugs in FreeBSD's paravirtualized Xen code which make 9-CURRENT far less stable than 8.2-RELEASE. With some luck I should be able to get this done before 9-STABLE branches in a few months so that it can be tested in the lead up to 9.0-RELEASE.

But this depends in large part on what FreeBSD users find that they need in EC2. Go launch some instances and let me know what you think.

Posted at 2011-03-22 05:00 | Permanent link | Comments

Recent posts

Monthly Archives

Yearly Archives


RSS