Tarsnap public beta

Tarsnap is an implementation of my idea of a perfect online backup service. After many months in private beta testing, tarsnap is now publicly available for BSD, Linux, and other UNIX-like operating systems.

The design of tarsnap was guided by the following four principles:

How does tarsnap satisfy these four principles? Let's look at them one by one.

Paranoid security

Before you can start using tarsnap, you have to register a machine with the tarsnap server using the tarsnap-keygen utility. This serves two purposes: First, it tells the server which account should be charged for usage; and second, the tarsnap-keygen utility generates cryptographic keys. Several cryptographic keys, in fact.

The keys used by tarsnap include

Naturally, not all of these keys are needed for every operation. Tarsnap comes with a tarsnap-keymgmt utility for generating key files with a subset of the keys. For example, you can create a key file with only the keys needed to write an archive; you could use this to generate backups from a server every day, knowing that if an attacker broke into your server and started deleting all of your files, he would be unable to destroy (or even read) your backups.

Tarsnap's paranoia also extends to the client-server protocol. With most online backup services, you're lucky if they use SSL; I avoided using SSL because I wasn't satisfied with its security. Obtaining a fraudulent SSL certificate is too easy (read: possible) for an attacker, and relying on external certification isn't necessary for tarsnap: The tarsnap server's public key is included in the tarsnap client code. Further, the complex and error-prone SSL protocol (how error-prone? Just look at how many security issues OpenSSL has had over the past decade) is replaced by a significantly streamlined protocol which is cryptologically similar to SSL and resists chosen-input side-channel attacks. (Cryptographers: The server uses its 2048-bit RSA key to generate an RSASSA-PSS signature on a Diffie-Hellman parameter modulo the 2048-bit "group 14" modulus. The Diffie-Hellman computation is performed using a blinded exponent. The result of the DH exchange is mixed with a 256-bit server nonce, a master key is generated using MGF1, and client and server AES-256 and HMAC-SHA256 keys are derived from that. Finally, the client and server exchange HMAC-SHA256 signatures of the master key to confirm that the key exchange succeeded. After the key exchange is complete, variable-length packets are exchanged, authenticated using the HMAC-SHA256 keys and encrypted using the AES-256 keys in CTR mode.)

Naturally, I would never ask someone to blindly trust a binary which I provide, so all the source code to the tarsnap client is available.

Tar-like front end

What's the most powerful and widely used archiving tool on UNIX-like systems? Tar. Various versions of tar have existed ever since Seventh Edition Unix was released in January 1979, and rare is the user of a UNIX-like system who hasn't at some point typed tar -xf foo.tar or tar -cf backup.tar ~/myfiles.

Tarsnap, as the name suggests, has the same look and feel of tar. You can run

tarsnap -c -f mybackup ~/myfiles
to generate an archive named "mybackup" which contains the files in ~/myfiles; and you can run
tarsnap -t -f mybackup
and
tarsnap -x -f mybackup
to list and extract the contents of the archive "mybackup". Because tarsnap is based on Tim Kientzle's excellent libarchive archiving library and the included bsdtar implementation of tar, tarsnap also accepts --exclude, --include, --keep-newer-files, --nodump, and many other options.

Of course, instead of being stored as a file on your local disk, tarsnap archives are stored remotely. Since you can't list your archives with ls, there's a

tarsnap --list-archives
command to do that for you; and since you can't delete an archive with rm, there's a
tarsnap -d -f backupInolongerwant
command to do that.

Finally, just in case you want to stop using tarsnap but keep your archives,

tarsnap -r -f mybackup
will convert the archive "mybackup" to a tar stream and write it to the standard output.

Snapshotting

The second part of the name "tarsnap" is "snap". Rather like in "Portsnap", actually.

As with Portsnap, the "snap" in tarsnap refers to snapshots; and this is what makes tarsnap both efficient and intuitive. Most backup systems work with a model of "full backups" plus "incremental backups": Once in a while, a full backup is generated, which involves storing everything; and then more frequent incremental backups are generated, which store only the differences since the last full or incremental backup. When you want to restore a backup, you start by going to the most recent full backup, and then you apply each of the intervening incremental backups -- all told, a big headache at exactly the time when you don't need any more headaches. By working with snapshots, tarsnap eliminates the "full plus incrementals" paradigm.

When tarsnap writes archives, it keeps track of each block of data which has been written to the server. If it sees the same block of data again -- even if it's part of a different archive -- it doesn't need to store that block. Tarsnap also keeps track of how many archives reference each block; when you delete an archive, tarsnap only removes blocks from the server when there are no archives remaining which reference them.

In the "full plus incrementals" paradigm, there's a trade-off between full backups and incremental backups: Incremental backups are far more efficient, but are less convenient because you can't keep an incremental backup unless you keep all of the preceding increments back to the last full backup. Snapshots provide the best of both worlds: The performance of incremental backups, and the convenience of being able to decide to keep or delete each archive completely independently of other archives.

Prepaid "utility" pricing

Tarsnap uses a model which should be familiar to anyone who has ever used a prepaid mobile phone: You deposit money into your account by sending a paypal payment, and as you use tarsnap, money is deducted from your account at a rate of 300 picodollars per byte of bandwidth used ($0.30 / GB) and 300 picodollars per byte-month of storage used ($0.30 / GB-month). When your account balance reaches zero, you lose access to tarsnap. If your account balance stays below zero for too long, your account will be permanently removed any backups you have stored will be deleted.

That said, tarsnap doesn't follow some of the more obnoxious aspects of mobile phone pricing: Your money won't evaporate after X days, there isn't any monthly "account maintenance fee", and if you want to stop using tarsnap, you can have your money back. It is your money, after all.

If you use twice as much storage and bandwidth, you pay twice as much. If you use a very small amount of storage, you pay very little -- some tarsnap users have less than 10 MB stored, and they're paying a fraction of a cent per month. All of the accounting for tarsnap is rounded in the customer's favour to the next attodollar (that's a millionth of a picodollar, or a quintillionth of a dollar), mostly because I'm a math geek who hates rounding errors.

Public beta

So what does it mean for tarsnap to now be in public beta? It means that if you go to the tarsnap beta testing website (UPDATE 2009-10-19: Go to the main Tarsnap website instead), you can create a tarsnap account, download the tarsnap client code, and start using tarsnap (at least, as soon as you send in some money via paypal to fund your account). It also means that there are some rough edges which I want to get rid of before I remove the "beta" label:

In spite of these (minor) rough edges, the past months of private beta testing have demonstrated to my satisfaction that the tarsnap code is very solid. The "beta" label is an indication that some things are still subject to change and improvement -- not a statement about the quality of the code which is already done.

So what are you waiting for? Go and get started with tarsnap! (UPDATE 2009-10-19: Link changed to point at the main Tarsnap website.)

Posted at 2008-11-10 21:45 | Permanent link | Comments
blog comments powered by Disqus

Recent posts

Monthly Archives

Yearly Archives


RSS