The design of my magic getopt

When I started writing the blog post announcing my magic getopt, I was intending to write about some of the design decisions which went into it. I changed my mind partway through writing: My readers probably cared about the functionality, but not about the ugly implementation details. It turns out that my original plan was the right one, as I've received questions about nearly every design decision I made. Since this clearly is of interest to my readers, here's the reasons behind some of the decisions I made while writing that code.

Why getopt? Why not <insert alternative command-line processing system here>?
There are two reasons here. First, while there is a wide variety of systems for parsing command lines, most of them are limited to "set a flag" or "store a value" options. In the cases where they can be instructed to execute arbitary code when an option is found, the code is inevitably somewhere else, eliminating the crucial "all the option handling is done in one place" property. Second, and more importantly: I wanted a drop-in replacement for getopt, so that existing UNIX software can migrate easily.

So how does it work?
Each GETOPT_* produce a case statement based on the line number in the source file (this is why you can't have two such statements on the same line). When we first enter the getopt loop, we hit every line number between the line number of the GETOPT_SWITCH and the line number of the GETOPT_DEFAULT (this is why that needs to occur last) and each statement registers its line number and the option string it handles. Finally, once the initialization is completed, we process the command line, with GETOPT extracting options and GETOPT_SWITCH switching on the line number of the registered option.

Why __LINE__? Why not __COUNTER__?
Because __COUNTER__ is not standardized, and thus not portable. It would be tempting to check if __COUNTER__ is available and use it if present — this would avoid the "only one option per line" restriction — but in fact this would be a horrible idea: Code which had multiple options one the same line would compile without warnings right up until someone used a compiler which lacked __COUNTER__. Better to use __LINE__ and avoid encouraging unportable people to write unportable code.

Isn't iterating through every line quite slow?
Not at all. It takes just a few clock cycles to hit the default case statement and jump back; even with a getopt loop 1000 lines long you'd only be wasting a few microseconds. For large loops the far larger cost is to compare each encountered option against all of the registered option strings, and this is the same in every getopt_long implementation I've read. In the extremely unlikely event that options-parsing becomes a performance problem, I would recommend switching the option search to use a binary search and thus far fewer string compares.

Why <setjmp.h>?
It's possible to for a program to have multiple getopt loops; this may be desireable for programs which operate differently depending on the name with which they are invoked, for example. Unfortunately, during the initial option-registering loop, it's necessary to jump back to the top — but we can't use goto because the identifiers of labeled statements in C have function scope, and someone might want to put two getopt loops into the same function. Automatic variables in C, however, have block scope — and so I use setjmp and longjmp simply to perform a block-scope goto. (Incidentally, I considered using these in a computed-jump lookup table; but it turned out that it wouldn't accomplish anything which I wasn't already getting via the case statement.)

Why does GETOPT_MISSING_ARG: disable warnings?
Because that's how getopt works. Beyond that? Beats me.

Posted at 2015-12-07 04:35 | Permanent link | Comments

Daemonic Dispatches

The design of my magic getopt

Recent posts

Monthly Archives

Yearly Archives