More details about the NetSweeper DoS attack.

After some exchanging a few emails with a developer at NetSweeper, it has become clear that what appeared to be a denial of service attack was indeed, as my NetSweeper corollary suggested, simply the behaviour of a very broken robot. The long series of requests made (3224 of them, over a 40 minute period) was a partial replay of legitimate traffic from a system running portsnap -- with the critical difference that the system running portsnap waited until one connection had closed before opening another.

Aside from the protocol violations I listed before (the rapid-fire requests from a system which is neither an interactive user-agent nor a proxy, the fraudulent User-Agent header, and the inobservance of robots.txt), a tcpdump of packets between my server and NetSweeper uncovered two serious bugs in their "categorization engine":

  1. NetSweeper only ever sends one request per HTTP connection, and does not send a "Connection: close" header; yet it describes itself as an HTTP/1.1 client. To quote RFC 2616,
    HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message.
    To fix this, NetSweeper should either start including that header line in its requests or label itself as an HTTP/1.0 client. Both fixes are trivial and would have the effect of allowing Apache to close the connection (and free resources) immediately after sending its response.
  2. NetSweeper does not shutdown sockets properly. After Apache gave up waiting for further requests and sent a FIN packet to close the connection, NetSweeper acknowledged the FIN (placing Apache's end of the connection into the FIN_WAIT_2 state, and NetSweeper's end of the connection into the CLOSE_WAIT state)... and then did nothing for 15 minutes. Finally, 917 seconds after Apache closed its end of the connection, NetSweeper closed its end by sending a FIN packet (to which it received a RST response, since FreeBSD drops idle half-closed connections after a minute).

    The fix for this is also very simple: There's a missing shutdown(2) call somewhere in NetSweeper's source code.

I've never been very favourably inclined towards web filtering engines, on general principle; but based on the level of coding and standards non-compliance I've seen here, I'm surprised that at least one of them functions at all.

Posted at 2005-09-24 22:00 | Permanent link | Comments

NetSweeper attempts a denial of service attack.

I looked at my MRTG statistics this morning and noticed something odd: Between 8AM and 9AM (UTC), the number of active Apache processes (usually less than 5) jumped up to 40. Looking at my server logs, the culprit was obvious:
portsnap.daemonology.net 66.207.120.227 - - [19/Sep/2005:08:20:19 +0000] "GET /t/733a6bdcdea7399617d98aab38f79345 bc35865c175a2d546e94343122de897f HTTP/1.1" 403 357 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0"

... (3222 lines elided)

portsnap.daemonology.net 66.207.120.227 - - [19/Sep/2005:08:57:12 +0000] "GET /bp/597f82d31ae80bf2c7e28bdfb5b8162b 4ded33ebdb7e2783e777f4c8b8ce96d5- 63cda9e7234e119fc7d3b81e17873d8c 1359e48b7e57bfac5de0a1ec31c4869d HTTP/1.1" 403 423 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0"
(I've added some spaces in the file names to allow the lines to wrap usefully.)

These peculiarly named files are used by portsnap to fetch updates to the FreeBSD ports tree. As such, there is no reason for any user-agent other than portsnap to be fetching these files. The IP address 66.207.120.227 resolves to host227.net-sweeper.com.

A few points come to mind concerning the behaviour of NetSweeper's robot here:

  1. Over the course of 2213 seconds, it sent 3224 requests: one request every 0.68 seconds. The 1993 Guidelines for Robot Writers suggest sending no more than one request per minute.
  2. It sends a Fraudulent User-Agent header. It claims to be running Firefox/1.0, but no interactive user would ever have attempted to download such a large number of files (particularly when many of them have neither existed nor been linked to from anywhere for several days -- the files used by portsnap are rather ephemeral).
  3. It does not obey instructions in robots.txt files -- in fact, it never fetches them.
  4. Even after receiving over 3000 "HTTP 403 (Forbidden)" responses -- which, even to a robot which doesn't obey the Robot Exclusion Standard should be a fairly clear indication that its presence is undesired -- it continued to attempt to fetch files.
  5. After a request was denied by my server, NetSweeper's robot held its connection open -- thereby keeping an Apache process busy -- until Apache stopped waiting for it and closed the connection 15 seconds later. This had the result of creating the maximum possible load on my server for the set of files requested.
It has been remarked on many occasions that "Any sufficiently sophisticated denial of service attack is indistinguishable from a large amount of legitimate traffic". In light of my observations today, I'd like to add the NetSweeper corollary: Any sufficiently broken robot is indistinguishable in its behaviour from a deliberate denial of service attack.

Posted at 2005-09-19 17:00 | Permanent link | Comments

Costs of arithmetic over various rings.

In the field of Real numbers, we can evaluate a rational function (in fact, any algebraic function) to n bits in O(n log n log log n) time. In the ring Z_{2^n}, we can likewise evaluate a rational function in O(n log n log log n) time. In the field Z_p, where p is an n-bit prime, however, the best we can do is O(n (log n)^2 log log n) -- slower by a factor of log n. Why?

The obvious answer is that the multiplicative inverse -- the hard step in computing a rational function -- can be computed via a Newton iteration (also known as p-adic lifting and Hensel lifting) over the Real numbers and Z_{2^n}, whereas inversion modulo a prime requires use of a GCD algorithm in some form. Inversion modulo a large prime, rational reconstruction (over the Real numbers or over Z_m), and computing elementary functions are all problems which "live" in the domain of continued fractions instead of polynomials, and they pay the price of an extra factor of log n for that generality.

I'm not altogether satisfied with this explanation; I get the feeling that there must be something more subtle involved. In the case of computing algebraic functions, note that this is easy over the Real numbers but hard modulo a large composite -- in fact, RSA depends upon this -- but of greater significance, even determining if a root exists is hard. The same applies somewhat to inversion: You can determine if an inverse exists over the Real numbers trivially (is the value non-zero?), over Z_{2^n} trivially (is the value odd?), but over Z_m you need to determine relative primality. That said, determining if an inverse exists modulo a prime is also trivial, so this is not merely a case of "solving a problem is at least as hard as determining if a solution exists", but my intuition suggests that there's something along those lines involved.

While this is a rather indistinct train of thought, it comes out of a very concrete problem. I'm looking at algorithms for arithmetic of RSA-sized values as part of my work towards a side channel free cryptographic library, and there's a curious dilemma: Using the FFT, arithmetic modulo 2^n + 1 is twice as fast as arithmetic modulo 2^n, so I'd prefer to use the first ring when performing Montgomery multiplication; but the initial setup requires computing an inverse over the ring, which is much faster for the second ring. I think the best approach might be to work with one ring for private key operations and the other for public key operations, due to the difference in the relative importance of overhead costs.

Posted at 2005-09-12 22:55 | Permanent link | Comments

Technical details about this blog.

As many people who know me will be aware, I tend towards minimalism in computer software. As a result, when I decided to start writing this, I looked around for the simplest blogging software I could find (it was NanoBlogger) and decided that it was far too complex for my taste. So I wrote my own.

This blog is generated by a 105-line script which uses the following commands: awk, cat, cp, cut, head, ls, sed, sh, sort, tail, tr, uniq. In addition to the script, I have 56 lines of CSS and 57 lines of page templates. I call this collection of script, css, and templates blogsh -- taken from the obvious name of the script. It took me about 8 hours to produce, most of which was spent fighting with CSS.

My code does not provide any facility for comments, nor does it construct an RSS feed. In light of cross-site attacks, I have no intention of adding any automated mechanism for reader comments; I may add an RSS feed at some point if someone can convince me that it could be useful.

I doubt many people will be interested in reusing my code, but if you are, send me an email and I'll give you a copy.

Posted at 2005-09-12 14:00 | Permanent link | Comments

Why am I writing a blog?

I've decided to start writing a blog -- at least in one sense of the word. Actually, I hesitate to call this a "blog" at all, since that has the unfortunate connotation of poorly written adolescent gutspill; but I must admit that, at least in the sense of "informal temporal personal commentary", this qualifies as a blog.

The name I've chosen should give the reader a clue as to my intentions: Daemonic Dispatches. To quote from the OED:

daemonic [...] adj. Of, relating to, or of the nature of, supernatural power or genius.

dispatch [...] n. A written message sent off promptly or speedily.

I'd also add another entry under daemonic, namely "Relating to daemonology or the BSD operating systems", since this allows me to avoid the immodesty of claiming that I possess a supernatural genius.

In short, I'll be using this as a venue for informal communications, generally relating either to FreeBSD or my academic research. I believe that this will be useful both to myself and to others, since I often come across interesting results or ideas in the course of my research which I do not consider to be worth of formal publication (although others have disagreed) but which I still feel a desire to communicate. This can be accomplished in part via direct correspondence with other people working in the same area; but placing such material here has the advantage of making it available to a much greater audience (assuming that search engines index it).

In addition to allowing me to communicate some points of interest with others, placing such notes here will hopefully assist me in finding them at a later date. I'm sure every researcher has been in the position of finding a problem, knowing that they had solved it several years prior, and yet not being able to find that earlier solution. (The problem is even worse if you never wrote down the solution in the first instance.)

To answer my rhetorical question: I'm writing a blog, not because I think that the details of what I ate for lunch (a cheese sandwich) or who I'm dating (nobody) will be of interest to the world, but rather because I think the incidental gems uncovered by my research will be of interest to a select audience.

Posted at 2005-09-12 13:50 | Permanent link | Comments

Recent posts

Monthly Archives

Yearly Archives


RSS