Complexity is insecurity

As I've been writing code for my Tarsnap online backup service over the past three years, I've gone out of my way to make it as secure as possible. I've written previously about the importance of carefully designing security systems before writing any code, thinking about mathematical proofs-of-correctness while writing code, cryptographic research concerning key derivation functions, and recommendations for using cryptography, all of which have informed my work on tarsnap; and I've made the tarsnap client source code available for public review -- after all, I refer to tarsnap as being "Online backups for the truly paranoid", and nobody who is truly paranoid would want to download and run code without inspecting the source code and compiling it themselves. However, there is a very important aspect of tarsnap's security which I haven't discussed previously: Complexity -- or rather, a lack thereof.

Complexity can be thought of as a type of code smell: It doesn't necessarily imply that there is a problem, but the presence of complexity is very strongly correlated with the presence of security vulnerabilities. In the design and construction of secure systems, it is important to not only consider mistakes which are guaranteed to cause problems, but to also consider factors which make it more likely that problems will arise -- or, put another way, factors which make it harder to get things right. The notion of "complexity" is largely subjective, but our intuition serves us well; as I'll discuss, there are many factors which can be intuitively seen as constituting "software complexity" and have consequences for software security.

The most obvious such factor is code size. If you have a fixed rate of bugs per line of code -- and evidence suggests that this is a reasonably accurate approximation -- then a project with twice as many lines of code will have twice as many bugs. Now, some bugs are harmless, and many harmful bugs are not security flaws; but there is still a general correlation between code size and security flaws. Considering adding a new feature to your application? Ask yourself not just whether it's worth the time required to implement the feature, but also whether it's worth the risk of introducing new bugs and security flaws.

A less obvious factor is the number of authors. One of the most common sources of bugs -- and security flaws -- is miscommunication between developers. When function A calls function B, which function is responsible for sanity-checking function B's inputs? If an object encounters an internal error, should its deconstructor be called during cleanup? Are strings NUL-terminated ASCII, or NUL-terminated UTF-8? Formal methods of software development, wherein functions' preconditions and postconditions are explicitly spelled out (in some cases, explicitly enough to be used by machine proof systems), are in many cases advantageous more because they avoid such miscommunications than for any other reason.

One oft-cited measure of software complexity is the cyclomatic complexity -- that is, the number of linearly independent cycles in the control flow graph, or roughly speaking, the number of conditional branches in the program. Where security is concened, there is a slightly more nuanced metric which is more important -- the number of rarely taken conditional branches -- because in any program, the place where bugs are most likely to be hidden is in code which is almost never executed. There are two reasons for this: First, having a large number of users provides, in effect, a great deal of fuzz testing, which is likely to uncover bugs in those code paths which are commonly invoked; and second, as software is continually developed, regions of code which are rarely executed tend to receive far fewer eyeball-hours of attention, with the result that bugs in those regions are also far less likely to be uncovered by code inspection. It is no coincidence that so many of the security flaws in widely used software such as OpenSSL concern obscure and rarely-used functionality -- the bugs in well-known and widely-used functionality were fixed a long time ago. (But note that often an attacker can force rarely-used functionality to be invoked, e.g., by negotiating an obsolete version of a protocol.)

Finally, sometimes people add complexity in the form of deliberate obfuscation. This too is dangerous: While it has a slight benefit in slowing down an attacker (Kerckhoff's principle aside, disassembling a system and figuring out how it works will always take a non-zero amount of time), it has the far larger disadvantage of impeding testing and auditing. Robert H. Morris once stated that the #1 rule of cryptanalysis is "look for plaintext" -- because the easiest mistake to make when encrypting something is to accidentally not encrypt it (or, equivalently, to encrypt data and then use the wrong buffer in the next step). Morris' rule applies just as much to testing as it does to cryptanalysis: When you're testing your code, you're likely to notice if there is plaintext where there ought to be ciphertext... but if you decide to "strengthen" a system by adding some extra obfuscation after encrypting, such an error would likely escape unnoticed. The same applies to other situations where obfuscation might be applied: In general, if you make it harder for an attacker to figure out what your code is doing, you also make it harder for yourself (or anyone who is auditing your code) to notice if your code isn't doing what it is supposed to be doing.

Now, a certain amount of complexity is unavoidable: Few people will use a program which has absolutely no functionality, no matter how securely it does nothing. However, being aware of these pitfalls, some steps can be taken to reduce the risk of security flaws:

As far as I am aware, nobody has found any security vulnerabilities in tarsnap. That's not saying very much: There is lots of wildly insecure software in which nobody has found any vulnerabilities, simply because nobody has bothered looking yet. (I know some people have looked at the tarsnap client source code -- I've received emails from several tarsnap users commenting on the code structure, the quality of comments in the code, the elegance of certain components, etc. -- but I am not aware of anyone looking at the tarsnap source code specifically with an eye towards finding vulnerabilities.) Moreover, I'm not so naive as to believe that I didn't make any mistakes while writing the tarsnap code -- I took my time and worked carefully, but even given that I'd be surprised if I managed to write 15000 lines of bug-free code.

But whether security vulnerabilities have been found, and whether there are bugs, are the wrong questions to ask. The right questions to ask are these: How many bugs are there; and what is the probability that any one bug would result in a security vulnerability? By avoiding complexity when possible, and containing it when it is unavoidable, we can make the answers to those two questions "not many", and "very low" -- and thereby maximize the probability that the number of security vulnerabilities found remains zero in the future.

Posted at 2009-09-04 19:00 | Permanent link | Comments
blog comments powered by Disqus

Recent posts

Monthly Archives

Yearly Archives


RSS