Daemonic Dispatches

A challenge to startups

"From those unto whom much has been given, much shall be expected." In various forms this sentiment has been expressed at least as far back as the third century AD, via the Gospel of Luke; more recently it has been cited frequently by US Presidents, and can be seen in modified form in such places as Spider-Man ("With great power comes great responsibility") and the demands of the Occupy movement that the "1%" pay higher taxes. I started thinking about this a few days ago after re-reading an essay by Paul Graham and thinking about how lucky I was to be running a startup company now rather than two decades ago.

In that essay, Paul remarked upon four developments which had made it cheaper than ever to launch a startup: Open Source Software, which dramatically reduced the cost of software (although as Jamie Zawinski famously pointed out, it only becomes free if your time has no value); Moore's Law, which dramatically reduced the cost of hardware; the Web, which made it possible to reach large audiences cheaply; and better programming languages (I would broaden this to software development tools generally) which made software development much faster. All of these remain very important ten years after Paul wrote that essay; while I think Paul somewhat overstated his case with the assertion that the Web made promotion free — the Web is now so packed full of better mousetraps that it's hard for the world to beat well-worn paths to every door — it's still startling to think about the effort which would be involved in marketing a social network, a hundred million item mail-order catalogue, of even a brand of mobile phones without the Web.

For all that what Paul wrote remains true, I'll add four more developments. First, starting with Y Combinator, which Paul co-founded and was in the process of getting off the ground even as he wrote that essay, there has been a dramatic expansion of the funding available to new startup companies; while I was lucky to be able to "bootstrap" my startup, the easy availability of funding helps a great many companies which would otherwise never start. Second, where startup founders previously had to figure things out largely on their own, there is now a very large amount of mentorship available — ranging from large organized programs like Y Combinator down to helpful individuals who declare that their inbox is always open. (My inbox, like Patrick's, is also always open, but I'm probably not the best person to ask for business advice, given that I'm famously bad at business.) Third, the availability of a large number of straightforward services for processing credit cards has made life dramatically simpler for startups which go the direction of having customers who pay for their products and services. It's hard to imagine, but when I started Tarsnap in 2006, Paypal was the least painful way of getting paid. And fourth, where Moore's law brought the cost of hardware down, cloud computing has reduced the capital cost of hardware to zero — which both allows for cheaper experimentation, and reduces the need for startups to find investors.

And so I have a challenge for my friends in the startup community: Seeing that so much has been given to us, think about what you can do to make things better for the next generation of startup founders.

There are no wrong answers here. Paul Graham contributed back in the form of mentorship and funding for startups; many founders of successful startups make such investments of money, but Paul Graham may be unique in combining it with such a successful mentorship program. Patrick and John Collison, having grown frustrated with the problem of receiving payments online, launched Stripe to solve that. Erin and Thomas Ptacek and Patrick McKenzie decided to fix technical hiring, which will doubtless be a great boon to future startups. My friend Terry Beech went from running a startup to teaching university courses about entrepreneurship, and then moved into Federal politics, winning a seat as a Member of Parliament in the recent election; I have no doubt that his background will allow him to make important contributions there. And my own small startup, being built heavily on Open Source Software, contributes each year an amount equal to its December operating profits to supporting Open Source Software; rather embarrassingly, this makes Tarsnap the 7th largest donor to the FreeBSD Foundation in 2015, something which I hope will change before the year end. Five examples, all entirely different; I have no doubt that there are far more with whom I am not personally familiar.

There are two weeks left in 2015, and it's a good season for thinking about such things. How will you contribute back?

Posted at 2015-12-18 12:30 | Permanent link | Comments

The design of my magic getopt

When I started writing the blog post announcing my magic getopt, I was intending to write about some of the design decisions which went into it. I changed my mind partway through writing: My readers probably cared about the functionality, but not about the ugly implementation details. It turns out that my original plan was the right one, as I've received questions about nearly every design decision I made. Since this clearly is of interest to my readers, here's the reasons behind some of the decisions I made while writing that code.

Why getopt? Why not <insert alternative command-line processing system here>?
There are two reasons here. First, while there is a wide variety of systems for parsing command lines, most of them are limited to "set a flag" or "store a value" options. In the cases where they can be instructed to execute arbitary code when an option is found, the code is inevitably somewhere else, eliminating the crucial "all the option handling is done in one place" property. Second, and more importantly: I wanted a drop-in replacement for getopt, so that existing UNIX software can migrate easily.

So how does it work?
Each GETOPT_* produce a case statement based on the line number in the source file (this is why you can't have two such statements on the same line). When we first enter the getopt loop, we hit every line number between the line number of the GETOPT_SWITCH and the line number of the GETOPT_DEFAULT (this is why that needs to occur last) and each statement registers its line number and the option string it handles. Finally, once the initialization is completed, we process the command line, with GETOPT extracting options and GETOPT_SWITCH switching on the line number of the registered option.

Why __LINE__? Why not __COUNTER__?
Because __COUNTER__ is not standardized, and thus not portable. It would be tempting to check if __COUNTER__ is available and use it if present — this would avoid the "only one option per line" restriction — but in fact this would be a horrible idea: Code which had multiple options one the same line would compile without warnings right up until someone used a compiler which lacked __COUNTER__. Better to use __LINE__ and avoid encouraging unportable people to write unportable code.

Isn't iterating through every line quite slow?
Not at all. It takes just a few clock cycles to hit the default case statement and jump back; even with a getopt loop 1000 lines long you'd only be wasting a few microseconds. For large loops the far larger cost is to compare each encountered option against all of the registered option strings, and this is the same in every getopt_long implementation I've read. In the extremely unlikely event that options-parsing becomes a performance problem, I would recommend switching the option search to use a binary search and thus far fewer string compares.

Why <setjmp.h>?
It's possible to for a program to have multiple getopt loops; this may be desireable for programs which operate differently depending on the name with which they are invoked, for example. Unfortunately, during the initial option-registering loop, it's necessary to jump back to the top — but we can't use goto because the identifiers of labeled statements in C have function scope, and someone might want to put two getopt loops into the same function. Automatic variables in C, however, have block scope — and so I use setjmp and longjmp simply to perform a block-scope goto. (Incidentally, I considered using these in a computed-jump lookup table; but it turned out that it wouldn't accomplish anything which I wasn't already getting via the case statement.)

Why does GETOPT_MISSING_ARG: disable warnings?
Because that's how getopt works. Beyond that? Beats me.

Posted at 2015-12-07 04:35 | Permanent link | Comments

A magic getopt

Parsing command lines in C is easy when all of the options are single characters: You pass your command line to getopt along with a string containing all the valid options; then you have a switch statement with a case for each option you want to handle. It looks something like this:

        int ch;
        while ((ch = getopt(argc, argv, ":af:")) != -1) {
                switch (ch) {
                case 'a':
                        aflag = 1;
                        break;
                case 'f':
                        printf("foo: %s\n", optarg);
                        break;
                case ':':
                        printf("missing argument to -%c\n", optopt);
                        /* FALLTHROUGH */
                default:
                        usage();
                }
        }

Unfortunately if you want to add support for long options — say, to accept a new --bar option — you need to switch to using getopt_long and your list of options is no longer confined to the options-processing loop:

enum options
{
	OPTION_BAR
};

...

static struct option longopts[] =
{
	{ "bar", required_argument, NULL, OPTION_BAR }
};

...

        int ch;
        while ((ch = getopt_long(argc, argv, ":af:", longopts, NULL)) != -1) {
                switch (ch) {
                case 'a':
                        aflag = 1;
                        break;
		case OPTION_BAR:
			printf("bar: %s\n", optarg);
			break;
                case 'f':
                        printf("foo: %s\n", optarg);
                        break;
                case ':':
                        printf("missing argument to -%c\n", optopt);
                        /* FALLTHROUGH */
                default:
                        usage();
                }
        }

Rather than adding a new option in one place (or two, if you count the list of options at the top of the loop as being a separate place), new long options require changes in three places — one of which (the enum) is often placed in an entirely separate file. So much for keeping code clean and free of duplication. There has got to be a better way, right?

Enter magic getopt. Via a little bit of macro magic, the above options-handling code turns into this:

        const char * ch;
        while ((ch = GETOPT(argc, argv)) != NULL) {
                GETOPT_SWITCH(ch) {
                GETOPT_OPT("-a"):
                        aflag = 1;
                        break;
                GETOPT_OPTARG("--bar"):
                        printf("bar: %s\n", optarg);
                        break;
                GETOPT_OPTARG("-f"):
                        printf("foo: %s\n", optarg);
                        break;
                GETOPT_MISSING_ARG:
                        printf("missing argument to %s\n", ch);
                        /* FALLTHROUGH */
                GETOPT_DEFAULT:
                        usage();
                }
        }

with each option listed just once, at the point where it is handled.

To use this, add getopt.c and getopt.h to your project, #include "getopt.h", and then make the following changes:

The option character (ch in the examples above) turns into an option string.
The function getopt is replaced by the macro GETOPT, which no longer needs the third parameter (the string containing the option characters to accept) and returns NULL upon reaching the end of the options list instead of -1.
Instead of switch (ch) you now need GETOPT_SWITCH(ch).
case 'a' turns into GETOPT_OPT("-a") for options without arguments, or GETOPT_OPTARG("-a") for options with arguments.
case ':' turns into GETOPT_MISSING_ARG.
default turns into GETOPT_DEFAULT, and this must be present at the end of the switch statement in order for the magic to work.
In the unlikely scenario that you had several case labels on the same line: Every GETOPT_* label needs to be on a different source code line.

And that's it. These changes can be made to a program which accepts single-character options almost mechanically and with no increase in the source code complexity; and then new options (short or long) can be added by simply adding the new option-handling code, without needing to make changes in several different places.

I think it's time for me to start adding long options to some of my projects.

Posted at 2015-12-06 23:45 | Permanent link | Comments

A challenge to startups

The design of my magic getopt

A magic getopt

Recent posts

Monthly Archives

Yearly Archives