Lessons learned from bountying bugs

A bit over a week ago, I wrote here about the $1265 of Tarsnap bugs fixed as a result of the Tarsnap bug bounty which I've been running since April. I was impressed by the amount of traffic that post received — over 82000 hits so far — as well as the number of people who said they were considering following my example. For them and anyone else interested in "crowdsourcing" bug hunting, here's some of the lessons I learned over the past few months.

Small bugs are important!

Bug bounties aren't new; but most of the recent ones have focused strictly on security vulnerabilities. With Tarsnap, I decided to offer a range of bounties — up to $1000 for security vulnerabilities, but down to $1 for "cosmetic" errors such as spelling mistakes in source code comments. My initial motivation for these was mostly as a proof-of-work: My primary concern was to get people looking at the source code, and finding even purely cosmetic errors is a sign that the code has been inspected.

It turned out that there was a more important benefit to offering bounties for small bugs: dopamine. If only major bugs received bounties, people could easily get discouraged after a few hours and stop looking; but the presence of small errors — a typo here, a punctuation error there — provides very much the same effect as the occasional small wins afforded by slot machines, encouraging people to keep going. I feel a bit guilty about taking advantage of human behaviour like this, to be perfectly honest (even though it wasn't deliberate); but at least code auditing is inherently a more productive activity than pulling a lever on a slot machine.

What is a bug?

Having established that it's worth awarding small bounties for small bugs, the next question is to decide what qualifies as a bug. Does a bug need to result in something breaking? Does it need to have any user-observable effect? Does it even need to result in a different sequence of machine language instructions being produced? Ultimately I answered no to all of these questions: To me, a bug is any place where the code is wrong, regardless of the consequences. Case in point: I consider any instance of C99 "undefined behaviour" to be a bug, even if — as in cases such as memcpy(foo, NULL, 0); — no C library in the world fails to do what was intended.

While this may seem like quibbling over trifles, there is a real benefit to taking this position. The Tarsnap codebase is not static; I add new features and make performance improvements on a regular basis (recently most of these have been on the server side, but the server shares a lot of code with the client). A mistake in a function might have no effect today based on how the function is used, but start causing problems tomorrow; better to fix problems now than to wait until they start to break things.

How serious is a bug?

I set four main tiers for bugs: Security issues, worth $500-1000; bugs which could plausibly cause problems for users, worth $50-100; "harmless" bugs, worth $10; and "cosmetic" errors, worth $1. There weren't any security issues reported, and the $1 bugs were generally clear; but between the $10 and $50 levels there were some distinctly borderline bugs. Take for instance an input-checking error in the tarsnap-keygen utility which would result in a request being sent to — and rejected by — the Tarsnap server instead of being rejected by the client utility. The correct outcome in such a case was "exit with an error", and to that extent the resulting behaviour was correct; but on the other hand the error message printed — "too many network failures" — was definitely not a good depiction of what the problem was. In another case, a bug could only be triggered by a peculiar and meaningless set of command-line options — sure, the code was wrong, but who would ever run into the problem?

In cases like this, a simple rule guided my decisions: Be fair, and explain decisions. The tarsnap-keygen bug I awarded $20 because I couldn't decide between $10 and $50; in another case I awarded two very similar bugs from the same reporter $10 and $50 respectively, explaining that they were both borderline so erring on the high side for one and on the low side for the other seemed the fairest thing to do.

Open source code

Tarsnap is built around the excellent foundation provided by libarchive, and one of the greatest challenges I had was to figure out how to handle bugs reported in libarchive code. Roughly 2/3 of the bugs reported were in libarchive code — slightly more than the "fair share" based on the amount of code in Tarsnap, but libarchive has the disadvantage of having been written by multiple authors — but these took far more than 2/3 of my time, for one simple reason: I often didn't know the code very well. With bug reports relating to "my" code I could almost always see immediately what was wrong and how it should be fixed; but for bugs in libarchive it wasn't always clear how the code in question worked, and on several occasions I had to write to other libarchive developers and ask for their help.

When I launched the Tarsnap bug bounty, one of my greatest concerns was that paying for bugs would somehow upset the balance between libarchive — an open source project — and Tarsnap — a commercial service. I'm happy to say that this didn't occur in the slightest; people were simply happy that I was fixing bugs. (It's possible that the reaction would have been different if libarchive were a GPL project, but in the BSD community there is a long history of companies voluntarily giving back because they realize it's to their advantage to do so — no coercive licenses required.)

Advice to bounty-offerers

I've been very pleased with the success of the Tarsnap bug bounty so far, and I'd urge any companies with public source code to establish similar bounties. Based on the above lessons and other observations I've made, here's some brief advice:

Where now?

The Tarsnap bug bounties will continue; I already have several emails reporting minor bugs to be fixed in the next Tarsnap release. I hope other people will follow my example — if anyone out there is interested in doing this I'd be happy to answer questions over email. But even if nobody else starts offering bug bounties, there's new code to inspect: As of now, I'm extending the Tarsnap bug bounty to cover open-source "spin-offs" from Tarsnap: the scrypt key derivation and encryption utility, the kivaloo data store, and the spiped secure pipe daemon.

Time to start reading some C!

Posted at 2011-09-05 14:45 | Permanent link | Comments

Iran forged the wrong SSL certificate

There has been a lot of talk recently about how someone — whom everyone presumes is the Iranian government — obtained a fake SSL certificate for *.google.com from DigiNotar; this is the second such case this year, as in March someone (again, presumed to be the Iranian government) obtained fraudulent certificates from Comodo for Firefox extensions, Google, Gmail, Skype, Windows Live, and Yahoo. (Interestingly, while everybody is removing DigiNotar's certificate authority key from their trusted lists, Comodo — which has issued far more certificates — is still widely trusted. I wonder if they got a free ride because nobody wants to ship "the web browser which doesn't work with my bank".)

If you want to be really evil, however, *.google.com is the wrong SSL certificate to forge. The right one? ssl.google-analytics.com.

By many reports, Google Analytics is used by almost half of the top million websites, and an even greater proportion of high profile sites. The way Google Analytics works, each web page has a <script> tag which loads the Google Analytics javascript; that code then gathers information and forwards it to Google. Privacy issues aside, it works well — as long as the javascript does what it should.

If the javascript has been tampered with, it could do anything javascript can normally do — and does so with the permissions of the web page it is running from. Read all the text on the page? No problem. Read the passwords you're typing in? Easy. Send it all to evil-democracy-suppressors.gov.ir? Easy to do using one or more web bugs.

When accessing sites via HTTPS, if Google Analytics is correctly installed the request to fetch the Google Analytics javascript will also be performed via HTTPS (if not, good web browsers will display a warning message); but if you have an SSL certificate for ssl.google-analytics.com you can supply your evil javascript anyway.

Sooner or later it's going to happen; obtaining forged SSL certificates is just too easy to hope otherwise. What can we do about it? Don't load the Google Analytics javascript when your site is accessed via HTTPS. This is easy to do: Just throw a if("http:" == document.location.protocol) around the document.write or s.parentNode.insertBefore code which loads the Google Analytics javascript. On the website for my Tarsnap online backup service I've been doing this for years — not just out of concern for the possibility of forged SSL certificates, but also because I don't want Google to be able to steal my users' passwords either!

And if you trust Google and you're not worried about Iran's demonstrated ability to obtain forged SSL certificates, ask yourself this: Do you trust the Chinese Ministry of Information Industry? Because your web browser probably does.

Posted at 2011-09-01 07:30 | Permanent link | Comments

Recent posts

Monthly Archives

Yearly Archives