Tarsnap beta testers wanted
As of today, everybody who has contacted me to express an interest in beta testing tarsnap, my upcoming online backup service, has been invited to start beta testing. A few bugs have been uncovered by beta testers, but I've fixed all of those; so tarsnap currently has no known bugs. What does "no known bugs" mean? It means that I need more beta testers!Tarsnap is an encrypted snapshotted backup service designed to match my concept of an ideal backup system. The back-end storage used by the service is Amazon S3, but the client code never talks to S3 directly -- the API provided by S3 is too weak to be directly useful, so the tarsnap client code only communicates with my tarsnap server. The tarsnap client code doesn't trust the server to do anything except store bits and hand them back when requested; all the data is encrypted by the client, and one of the design principles behind tarsnap is that the NSA (and other less capable adversaries, of course) should be unable to access your data or learn anything significant about it, even if they force me to cooperate with them. (This has two benefits: First, it keeps your data safe; second, it makes sure that the CIA has no reason to kidnap me.)
Some notes about the beta:
- The tarsnap client code currently runs on FreeBSD and Linux. There is also partial support for OS X -- the code will run, but it won't back up resource forks or ACLs. Windows is not supported at present.
- This is currently a free beta, but at some point it will stop being free. At that point beta testers will have 30 days to decide if they want to start paying or stop using tarsnap.
- When the free beta ends, tarsnap will probably cost $0.30 per GB of bandwith (incoming + outgoing) plus $0.30 per GB per month of stored backups (after compression, of course -- the tarsnap client compresses data before encrypting it, and the tarsnap server can't tell how much data you had before compression). This is slightly more than what I was hoping for when I starting working on this 18 months ago... I hope I can bring these prices down later.
- This is a beta. I don't expect to lose anyone's data, but it could happen. More likely is that there could be occasional outages when the tarsnap server isn't available. Neither of these have happened yet -- but there's enough risk that I don't recommend using tarsnap to operate a nuclear power plant.
- Use of tarsnap is at your own risk. It might break. It might eat your dog. It might be slippery when wet. If you use tarsnap, you're agreeing to not sue me if anything goes wrong. (I hate software exclusion-of-liability boilerplate. I wish it wasn't necessary. I think my friends in law school might kill me if I didn't include it.)
If you're interested in beta testing tarsnap, please send me an email and tell me 1. which operating system(s) you want to use tarsnap on; 2. approximately how many systems and how much data you'll be backing up; and 3. that you have read and agree to the above notes.
UPDATE: To clarify, I might not let everybody into beta testing immediately, depending on how many people are interested; but if I can't let everybody in immediately I'll put the rest onto a waiting list.
The upgraded freebsd-update server
I wrote recently about the surge in traffic to update1.freebsd.org after FreeBSD 7.0 was released. I was concerned about whether the server could continue to handle the load in the future -- until I received an email from Layered Technologies.Layered Technologies has been generously donating the server which hosts FreeBSD Update and one of the Portsnap mirrors (as well as my personal websites) ever since February 2005. In this time I've had to ask them to reboot the server a few times when I've broken something or when a FreeBSD kernel bug has left the system unresponsive; but they've always responded promptly, and in three years I've never run into any problems with their network. I can't comment on their billing system -- my sole interaction with it has been to receive a bill for $0.00 each month -- but based on everything else I've seen I have no hesitation in recommending Layered Technologies.
And what was the email which made me no longer concerned about whether this server could continue to shoulder the update1.freebsd.org load? Jeremy Suo-Anttila, the CTO of Layered Technologies, wrote to let me know that they have increased the server's uplink from 10Mbps to 100Mbps and tripled its monthly bandwidth quota -- which should be more than enough to handle the load. Thanks Jeremy!
Security is Mathematics
In a recent editorial in Wired News, Bruce Schneier commented on the twisted mind of security professionals; that is, the way that we look at the world, always questioning hidden assumptions -- like the assumption that someone who buys an ant farm will mail in the included card asking to have a tube of ants delivered to his own address, rather than someone else's address. Schneier suggests that this "particular way of looking at the world" is very difficult to train -- far more difficult than the domain expertise relevant to security. I respectfully differ: In my opinion, this mindset is not particular to security professionals; and universities have been successfully training people to hold this mindset for centuries.In the fall of 1995, in my second year as an undergraduate student at Simon Fraser University, I took a course numbered 'Math 242', with the title "Introduction to Analysis". This was (and still is) a required course for mathematics undergraduates, and for very good reason; it is often described as "the course which decides if you're going to get a degree in Mathematics", and is the first undergraduate course which takes mathematical rigor seriously. Remember in first-year calculus where the topics of sequences, series, convergence, continuity of functions, and limits were glossed over? This is the course where you learn to prove everything you thought you already knew.
In the semester I took this course, the average grade on the first mid-term examination was 29%. Three students (myself included), out of a class of about 40, scored higher than 50%. I don't know exact numbers for other semesters, but my understanding is that grade distribution wasn't particularly unusual.
Why was the average grade so low? Because the entire mid-term examination consisted of writing proofs; and a proof isn't correct unless it considers all possible cases. Forgot to prove that a limit exists before computing what it must be? Your proof is wrong. Assumed that your continuous function was uniformly continuous? Your proof is wrong. Jumped from having proven that a function is continuous to assuming that it is differentiable? Your proof is wrong. Made even the slightest unwarranted assumption, even if what you ended up thinking that you had proved was true? Sorry, your proof is wrong.
This is what Schneier calls the "security mindset" -- and all mathematicians have it. In the first chapter of my doctoral thesis, I devoted a page to proving a lemma concerning the distribution of primes (namely, that between x and x * (1 + 2 / log(x)) there are at least x / log(x)^2 primes, i.e., at least half of the "expected" number). I didn't do this merely because I liked the notion of citing a paper concerning the distribution of zeroes of the Riemann zeta function in a thesis about string matching (although I admit that I found the juxtaposition appealing); rather, I did it because I couldn't prove an error bound on my randomized algorithm without this lemma. Most computer scientists would have waved their hands and made the common assumption that prime numbers "behave randomly"; but with my mathematical training, I wanted a proof which didn't rely on extraneous assumptions.
Knuth is famous for the remark "Beware of bugs in the above code; I have only proved it correct, not tried it", and the implicit statement that a proof-of-correctness is not adequate to ensure that code will operate correctly is one I absolutely agree with; however, it is important to consider the nature of bugs which evade the eye of a proof-writer. These bugs -- and, I posit, the potential bugs which Knuth was warning against -- tend to be errors in transmitting ideas from brain to keyboard: Missing a semicolon or parenthesis, for example, and thereby rendering the code uncompilable; or mixing up two variable names, and thereby causing the code to never function as specified. These bugs are easily found by quite minimal testing; so while neither testing nor proving is particularly effective alone, in combination they are highly effective.
More importantly than this, however, is that the sort of edge cases which mathematicians are trained to think about in writing a proof are exactly the sort which cause most security issues. Very few security problems "in the wild" are the result of bugs which are tripped over all the time -- such bugs don't survive long enough to cause problems for security. Rather, security issues arise when an unanticipated rare occurrence -- say, an exceptionally large input, a file which is corrupted, or a network connection which is closed at exactly the wrong time -- takes place. For this reason, when I write security-critical code I generally construct a proof as I go along; I don't go to the extent of writing down said proof, but by thinking about how I would prove that the code is correct, I force myself to think about all of the edge cases which might be potentially hazardous.
Schneier is right that security requires a strange mindset; and he's right that computer science departments aren't good places to teach this mindset. But he's wrong in thinking that it can't be taught: If you want someone to understand security, just send him to a university mathematics department for four years.
The busy freebsd-update server
On Wednesday evening, Ken Smith announced the availablility of FreeBSD 7.0-RELEASE. And update1.freebsd.org started crying.Based on what I saw when FreeBSD 6.3-RELEASE was announced, I didn't expect any problems -- there was a visible increase in traffic, but it didn't come anywhere close to tying up the server. I hadn't accounted for two important factors:
- Upgrading from FreeBSD 6.x to FreeBSD 7.0 involves more and larger updates than upgrading to FreeBSD 6.3.
- FreeBSD 7.0 is far more popular than FreeBSD 6.3.
On Thursday, February 28th, 885 systems used FreeBSD Update to upgrade to FreeBSD 7.0-RELEASE; of these, 282 were running FreeBSD 7.0 betas or release candidates, 404 were running FreeBSD 6.3, 174 were running FreeBSD 6.2, 13 were running FreeBSD 6.1, 11 were running FreeBSD 6.0, and 1 was running FreeBSD 5.5.
In total, update1.freebsd.org handled 50.1 million HTTP requests -- an average of 58 requests per second -- serving up 130939 distinct files and patches totalling 39.9 GB -- an average data rate of 3.7 Mbps (not counting HTTP/TCP/IP overhead). The effect over this traffic on the server is perhaps best illustrated by the following two MRTG graphs; the first graph shows total and active Apache processes, while the second shows incoming and outgoing bandwidth:
A few notes are in order concerning the above graphs:
- This server has an uplink capped at 10 Mbps; on several occasions it came very close to that limit.
- The primary reason it didn't hit the 10 Mbps limit more is that for five hours Apache was at the maximum number of processes I had configured (100) and all of them were busy handling requests. When I woke up on Thursday morning (around 1800 UTC -- 10AM in my time zone) I logged in and increased Apache's process limit.
- When I wrote the code for converting MRTG bandwidth statistics into 95th percentile and GB/month values, I didn't bother handling leap years.
In short, the FreeBSD Update server was handling about as much traffic as it is capable of handling (at least unless its uplink is upgraded and I switch from Apache to a faster web server), and there were most likely some people who tried to use FreeBSD Update between 1200 UTC and 1800 UTC and found that the server was either very slow or completely unresponsive. If you had problems upgrading, please try again later -- perhaps a random day next week, since as I write this I already see the load increasing as Friday afternoon (UTC) approaches. For myself, I've learned an important lesson: Next time there's a FreeBSD release, I'm going to make sure there are several FreeBSD Update mirrors ready to share the load.
One final addendum: While my bsdiff binary patching tools is usually highly efficient -- for security updates, it routinely provides a greater than fifty-fold reduction in download size -- it performed quite poorly overall at producing patches for upgrading from FreeBSD 6.x to FreeBSD 7.0, providing only a five-fold reduction in download size. Why? Because FreeBSD 6.x uses gcc 3.4, while FreeBSD 7.0 uses gcc 4.2. Such a major change in compiler means that even binaries compiled from identical source code differ throughout, dramatically reducing the potential for bsdiff (or any other binary patch tool) to identify similarities. Let this be a lesson to anyone who uses binary patches to update devices: Think twice before changing compilers!
The (good) deal with freebsd-update(8)
Earlier today, I stumbled across a blog post by Radu Cristian Fotescu entitled The (bad) deal with freebsd-update(8), which (as the title suggests) casts FreeBSD Update in a rather unfavourable light. Since the author is misinformed about several details, I'm taking this opportunity to set the record straight.First, the author points out that there is an older version of FreeBSD Update available in the ports tree, which he states "can only fetch updates for FreeBSD 6.1". In fact, the version in the ports tree works for releases dating back to FreeBSD 4.7 (although it obviously doesn't provide binary updates to fix bugs which were uncovered after a release ceased to be supported by the FreeBSD Security Team). The only releases which the version of FreeBSD Update in the ports tree does not support are FreeBSD 6.2 and up -- versions of FreeBSD which contain a new (and vastly improved) version of FreeBSD Update in the base system. Once FreeBSD Update is in all supported FreeBSD releases (i.e., in June) I'll remove the old FreeBSD Update code from the ports tree.
Next, the author questions the logic of having "64-byte keys" (actually, 64 hexadecimal digit keys) as file names, and suggests that this makes FreeBSD Update overly complex. Nothing could be further from the truth: In fact, as I described in my BSDCan'07 talk, the "Reference by [SHA256] hash" method makes both FreeBSD Update and Portsnap far simpler than they would otherwise be.
The author then moves on to speaking of "a patch applied to a given release and patch level", thereby demonstrating a fundamental misunderstanding of how FreeBSD Update works. In the author's mind (apparently), to update a system from FreeBSD 6.2-RELEASE-p9 to FreeBSD 6.2-RELEASE-p10, FreeBSD Update downloads a (single) patch and applies it. Not so; rather, FreeBSD Update fetches a file which tells it what FreeBSD 6.2-RELEASE-p10 looks like. FreeBSD Update then makes the system look like that: It can leave files alone if they are already up to date (or if the user has asked it to leave those files alone); or it can download or generate the new versions of files. Put another way, in most patching systems, the server will answer the question "how do I get there from here?" -- with FreeBSD Update, the server merely answers the question "where should I be going?" and leaves it up to the FreeBSD Update client to figure out how to get there.
Related to this error is another mistake which immediately follows: The author asserts that the "full new binaries" are not available. In fact, for every file which appears in a (recent) FreeBSD release, or in a FreeBSD release plus patches, is available via the FreeBSD Update server. (I was concerned that I might be technically violating the GPL on some files by this fact, until I remembered that the FreeBSD source code is also distributed via FreeBSD Update.) FreeBSD Update uses patches in exactly the same way as Portsnap: As I described in my BSDCan'07 talk (linked above), FreeBSD Update and Portsnap rely on "opportunistic patching" -- they start out by attempting to fetch patches and apply them, but if anything goes wrong (the patch isn't available, the file generated by patching has the wrong SHA256 hash, et cetera), they gracefully fall back to fetching the complete file.
Next, the author points out that the list of binary patches used for updating to FreeBSD 6.3 is publicly visible. Oops -- this is fixed now. I don't have any desire to keep this list of file names secret, but there are two very good practical reasons for turning off the directory indexing: First, Apache processes chew up lots of RAM when generating large directory listings; and second, I was having problems with robots ignoring my "don't crawl here" directives in robots.txt and loading down my server with large numbers of pointless requests.
Moving on, the author points to the approach of RedHat, Debian, and Mandriva, of distributing entirely new package tarballs, as a model to be emulated. I don't know how fast the author's internet connection is, but I know one of the most frequent comments I hear about FreeBSD Update is how incredibly fast it is. This is what binary patches do for you -- provide a fifty-fold reduction in the bandwidth needed to download security updates. The tool I wrote for this purpose -- bsdiff -- is now used by Apple, FireFox, Sophos, and probably Amazon's Kindle (in this last case, I haven't heard from any developers, but they have bsdiff code on the device, so presumably they're using it) in addition to FreeBSD, and in the summer of 2006 I calculated that it had saved users upwards of 100 person-years of waiting for updates to download. Returning to downloading complete tarballs every time a small change is made might be simple, but it wouldn't be very popular with many people who have to wait for said tarballs to download!
Finally, the author complains that he can't find the FreeBSD Update server code. As a comment to the blog entry points out, the server code is in the FreeBSD projects repository.