Tarsnap public betaTarsnap is an implementation of my idea of a perfect online backup service. After many months in private beta testing, tarsnap is now publicly available for BSD, Linux, and other UNIX-like operating systems.
The design of tarsnap was guided by the following four principles:
- Backups should be secure against attack from Hostile Governments and (Extended) Three Letter Agencies, even if they force me to cooperate with them. You shouldn't have to trust me to keep your secrets if someone puts a gun to my head -- because if someone puts a gun to my head, I'll do whatever they tell me to do.
- Backups should be powerful but intuitive. Each time you create a backup, you should have complete control over what you want to have backed up; and you should be able to delete backups at any time. A backup service shouldn't do things like deleting old backups after 30 days just because the original version of the file was deleted.
- Backups should be efficient. You should be able to rename files, create multiple copies of them, append data to them, edit parts of them, concatenate them together, et cetera, without needing to store the unchanged data twice.
- Backups should be provided as a utility, with pay-as-you-go pricing. Forcing people to figure out ahead of time how much data they want to back up so that they can sign up for the right "plan" is dumb, and having some customers subsidize other customers is inherently unfair.
Paranoid securityBefore you can start using tarsnap, you have to register a machine with the tarsnap server using the tarsnap-keygen utility. This serves two purposes: First, it tells the server which account should be charged for usage; and second, the tarsnap-keygen utility generates cryptographic keys. Several cryptographic keys, in fact.
The keys used by tarsnap include
- A 2048-bit RSA key used for signing archives. This is used in combination with SHA256 and a Merkle hash tree to verify the authenticity of stored archives. (Cryptographers: RSASSA-PSS is used, with SHA256 as the hash function.)
- A 2048-bit RSA key used for encrypting session keys. All the data which the tarsnap client sends to the server to store is encypted with per-archive random AES-256 keys; those keys are encrypted with this RSA key and attached to the stored data. (Cryptographers: RSAES-OAEP is used to encrypt the session keys, using MGF1 and SHA256, and the padding verification performed when decrypting is carefully written to be free of timing side channels. The AES-256 keys are used in CTR mode, with sequentially incrementing nonces.)
- A 256-bit HMAC-SHA256 key used to protect each individual block of data from tampering. From a cryptographic perspective, this is unnecessary, since a Merkle hash tree protects each archive; but data is compressed using zlib before being stored, so this provides protection against a theoretical attacker who can tamper with stored data and has found a security flaw in zlib decoding.
- Two 256-bit HMAC-SHA256 keys used to generate names for blocks of data stored. Tarsnap uses the same reference-by-hash trick as my Portsnap and FreeBSD Update utilities do; using HMACs instead of raw SHA256 hashes prevents any information from leaking via the hashes. (Why two keys? One is used to hash data, and the other is used to hash archive names. Yes, that's right, even the names of archives are stored securely.)
- Three 256-bit HMAC-SHA256 keys used to sign requests sent by the tarsnap client to the tarsnap server: One used for writes, one used for reads, and one used for deletes. These are the only keys sent to the tarsnap server by tarsnap-keygen.
Tarsnap's paranoia also extends to the client-server protocol. With most online backup services, you're lucky if they use SSL; I avoided using SSL because I wasn't satisfied with its security. Obtaining a fraudulent SSL certificate is too easy (read: possible) for an attacker, and relying on external certification isn't necessary for tarsnap: The tarsnap server's public key is included in the tarsnap client code. Further, the complex and error-prone SSL protocol (how error-prone? Just look at how many security issues OpenSSL has had over the past decade) is replaced by a significantly streamlined protocol which is cryptologically similar to SSL and resists chosen-input side-channel attacks. (Cryptographers: The server uses its 2048-bit RSA key to generate an RSASSA-PSS signature on a Diffie-Hellman parameter modulo the 2048-bit "group 14" modulus. The Diffie-Hellman computation is performed using a blinded exponent. The result of the DH exchange is mixed with a 256-bit server nonce, a master key is generated using MGF1, and client and server AES-256 and HMAC-SHA256 keys are derived from that. Finally, the client and server exchange HMAC-SHA256 signatures of the master key to confirm that the key exchange succeeded. After the key exchange is complete, variable-length packets are exchanged, authenticated using the HMAC-SHA256 keys and encrypted using the AES-256 keys in CTR mode.)
Naturally, I would never ask someone to blindly trust a binary which I provide, so all the source code to the tarsnap client is available.
Tar-like front endWhat's the most powerful and widely used archiving tool on UNIX-like systems? Tar. Various versions of tar have existed ever since Seventh Edition Unix was released in January 1979, and rare is the user of a UNIX-like system who hasn't at some point typed tar -xf foo.tar or tar -cf backup.tar ~/myfiles.
Tarsnap, as the name suggests, has the same look and feel of tar. You can run
tarsnap -c -f mybackup ~/myfilesto generate an archive named "mybackup" which contains the files in ~/myfiles; and you can run
tarsnap -t -f mybackupand
tarsnap -x -f mybackupto list and extract the contents of the archive "mybackup". Because tarsnap is based on Tim Kientzle's excellent libarchive archiving library and the included bsdtar implementation of tar, tarsnap also accepts --exclude, --include, --keep-newer-files, --nodump, and many other options.
Of course, instead of being stored as a file on your local disk, tarsnap archives are stored remotely. Since you can't list your archives with ls, there's a
tarsnap --list-archivescommand to do that for you; and since you can't delete an archive with rm, there's a
tarsnap -d -f backupInolongerwantcommand to do that.
Finally, just in case you want to stop using tarsnap but keep your archives,
tarsnap -r -f mybackupwill convert the archive "mybackup" to a tar stream and write it to the standard output.
SnapshottingThe second part of the name "tarsnap" is "snap". Rather like in "Portsnap", actually.
As with Portsnap, the "snap" in tarsnap refers to snapshots; and this is what makes tarsnap both efficient and intuitive. Most backup systems work with a model of "full backups" plus "incremental backups": Once in a while, a full backup is generated, which involves storing everything; and then more frequent incremental backups are generated, which store only the differences since the last full or incremental backup. When you want to restore a backup, you start by going to the most recent full backup, and then you apply each of the intervening incremental backups -- all told, a big headache at exactly the time when you don't need any more headaches. By working with snapshots, tarsnap eliminates the "full plus incrementals" paradigm.
When tarsnap writes archives, it keeps track of each block of data which has been written to the server. If it sees the same block of data again -- even if it's part of a different archive -- it doesn't need to store that block. Tarsnap also keeps track of how many archives reference each block; when you delete an archive, tarsnap only removes blocks from the server when there are no archives remaining which reference them.
In the "full plus incrementals" paradigm, there's a trade-off between full backups and incremental backups: Incremental backups are far more efficient, but are less convenient because you can't keep an incremental backup unless you keep all of the preceding increments back to the last full backup. Snapshots provide the best of both worlds: The performance of incremental backups, and the convenience of being able to decide to keep or delete each archive completely independently of other archives.
Prepaid "utility" pricingTarsnap uses a model which should be familiar to anyone who has ever used a prepaid mobile phone: You deposit money into your account by sending a paypal payment, and as you use tarsnap, money is deducted from your account at a rate of 300 picodollars per byte of bandwidth used ($0.30 / GB) and 300 picodollars per byte-month of storage used ($0.30 / GB-month). When your account balance reaches zero, you lose access to tarsnap. If your account balance stays below zero for too long, your account will be permanently removed any backups you have stored will be deleted.
That said, tarsnap doesn't follow some of the more obnoxious aspects of mobile phone pricing: Your money won't evaporate after X days, there isn't any monthly "account maintenance fee", and if you want to stop using tarsnap, you can have your money back. It is your money, after all.
If you use twice as much storage and bandwidth, you pay twice as much. If you use a very small amount of storage, you pay very little -- some tarsnap users have less than 10 MB stored, and they're paying a fraction of a cent per month. All of the accounting for tarsnap is rounded in the customer's favour to the next attodollar (that's a millionth of a picodollar, or a quintillionth of a dollar), mostly because I'm a math geek who hates rounding errors.
Public betaSo what does it mean for tarsnap to now be in public beta? It means that if you go to the tarsnap beta testing website (UPDATE 2009-10-19: Go to the main Tarsnap website instead), you can create a tarsnap account, download the tarsnap client code, and start using tarsnap (at least, as soon as you send in some money via paypal to fund your account). It also means that there are some rough edges which I want to get rid of before I remove the "beta" label:
- The tarsnap website is functional, but minimalist to an extreme. (UPDATE 2009-10-19: The Tarsnap website is now less extremely minimalist.)
- I need to write more documentation.
- I currently process incoming payments manually, which can result in a delay of 8-12 hours if a payment arrives when I'm not in front of my laptop. (UPDATE 2009-10-19: Payments have been processed automatically since January 2009.)
- The tarsnap account management interface on the beta testing website doesn't have a password reset mechanism yet.
- Some minor scaling issues need to be fixed; if you plan on storing more than 100 GB of data on tarsnap, please let me know in advance. (UPDATE 2010-02-23: The above-mentioned issues were fixed a few months ago. There are still scaling issues, but they're much further away now -- please warn me if you plan on uploading more than 5 TB, though.)
- I haven't figured out how to deal with Canadian and BC sales taxes yet, so for now only non-Canadians are allowed in.
In spite of these (minor) rough edges, the past months of private beta testing have demonstrated to my satisfaction that the tarsnap code is very solid. The "beta" label is an indication that some things are still subject to change and improvement -- not a statement about the quality of the code which is already done.
So what are you waiting for? Go and get started with tarsnap! (UPDATE 2009-10-19: Link changed to point at the main Tarsnap website.)
blog comments powered by Disqus