Write opinionated workarounds

A few years ago, I decided that I should aim for my code to be as portable as possible. This generally meant targeting POSIX; in some cases I required slightly more, e.g., "POSIX with OpenSSL installed and cryptographic entropy available from /dev/urandom". This dedication made me rather unusual among software developers; grepping the source code for the software I have installed on my laptop, I cannot find any other examples of code with strictly POSIX compliant Makefiles, for example. (I did find one other Makefile which claimed to be POSIX-compatible; but in actual fact it used a GNU extension.) As far as I was concerned, strict POSIX compliance meant never having to say you're sorry for portability problems; if someone ran into problems with my standard-compliant code, well, they could fix their broken operating system.

And some people did. Unfortunately, despite the promise of open source, many users were unable to make such fixes themselves, and for a rather large number of operating systems the principle of standards compliance seems to be more aspirational than actual. Given the limits which would otherwise be imposed on the user base of my software, I eventually decided that it was necessary to add workarounds for some of the more common bugs. That said, I decided upon two policies:

  1. Workarounds should be disabled by default, and only enabled upon detecting an afflicted system.
  2. Users should be warned that a workaround is being applied.

The first policy is essential for preventing a scenario often found in older software: A workaround is added for one system, but then that workaround introduces a problem on a second system and so a workaround is added for the workaround, and then a problem is found with that second workaround... and ten years later there's a stack of workarounds to workarounds which nobody dares to remove, even though the original problem which was being worked around has long since been corrected. If a workaround is disabled by default, it's less likely to provoke such a stack of workarounds — and it's going to be much easier to remove them once they're no longer needed.

The second policy is important as a matter of education: Users deserve to know that they're running a broken operating system. And running broken operating systems they are doing. Here are some of the warnings people will see, along with explanations (more for the benefit of people who arrive here via google than for my regular readership):

But as passionate as I am about user education, there's a far more important reason for that second policy: Getting things fixed. All of these are problems we could have worked around silently; indeed, with the exception of the LLVM bug (which I don't think anyone else has noticed) all of them have been worked around silently. But while silent workarounds solve the immediate problem for one piece of software, they do nothing to help the next developer who trips over those bugs. Warnings, on the other hand, can help to get bugs fixed: Indeed, a few months ago I fixed a bug in FreeBSD for the sole reason that I was getting annoyed by one of my own warning messages! Even if the vast majority of people who see those warnings disregard them, any chance that the right developer will get the message and fix a bug is better than none.

My regular readers will know that I care deeply about producing correct code, offering bounties for issues as trivial as misplaced punctuation in comments. But it isn't just my own code I care about; I'm affected by bugs in all of the code I run, and even by bugs in code I don't run if I rely on someone else who does. So please, if you find a bug, don't just work around it; shout it from the rooftops in the hope that the right people will hear.

Because if we all stop accepting broken code, we might eventually end up with less broken code.

FreeBSD on EdgeRouter Lite - no serial port required

I recently bought an EdgeRouter Lite to use as a network gateway; I had been using a cheap consumer wifi/NAT device, but I wanted the extra control I could get by running FreeBSD rather than whatever mangled version of Linux the device came with. Someone wrote instructions on installing FreeBSD onto the EdgeRouter Lite two years ago, but they rely on using the serial port to reconfigure the boot loader — perfectly straightforward if you have a serial cable and know what you're doing, but I decided to take the opportunity to provide a more streamlined process.



  1. Fetch a FreeBSD HEAD source tree. (FreeBSD 10-STABLE is not supported yet. I think this might change between now and 10.3-RELEASE.)
  2. Download the image building script.
  3. Run ./buildimg.sh /path/to/src/tree disk.img.
  4. Remove three small screws from the back of the EdgeRouter Lite. Open the case and remove the USB drive. (Mine was held very firmly in place. I found that wiggling it towards and away from the board allowed me to gradually ease it free.)
  5. Plug the USB disk into the system where you built the FreeBSD image.
  6. Run dd if=/dev/USBDISK of=ERL.img where USBDISK is the name of the USB disk device (probably da0), to make a backup of the EdgeRouter Lite software in case something breaks and you need to restore it later.
  7. Run dd if=disk.img of=/dev/USBDISK (where USBDISK is as before) to write the FreeBSD disk image onto the EdgeRouter Lite USB disk.
  8. Plug the USB disk back into the EdgeRouter Lite, close the box, and replace the three screws.

There are three gigabit ethernet ports on the EdgeRouter Lite, marked on the case as "eth0", "eth1", and "eth2"; in FreeBSD, they show up as "octe0", "octe1", and "octe2" in the same order. With the configuration on my image:

That's pretty much all you need to know to install and use FreeBSD on the EdgeRouter Lite; but there are some interesting tricks involved in the script which builds the disk image, so for the rest of this blog post I will provide a brief "walkthrough" of the script.

#!/bin/sh -e
Shell scripts are run by /bin/sh. (No matter what some misguided Linux users may think, /bin/bash is not a standard shell.) The -e option tells the shell interpreter to exit if any of the commands fail — if something goes wrong, we should stop and let the user see what happened rather than continuing and producing a broken disk image later.
if [ -z "$1" ] || [ -z "$2" ]; then
	echo "buildimg.sh srcdir disk.img"
	exit 1
This script takes two options: The location of the FreeBSD source tree, and the name of the file to use for the disk image.
# Set environment variables so make can use them.
export TARGET=mips
export TARGET_ARCH=mips64
The EdgeRouter Lite is a 64-bit MIPS system; FreeBSD HEAD has a kernel configuration already defined for it.
export WITHOUT_MODULES="cxgbe mwlfw netfpga10g otusfw ralfw usb rtwnfw"
Unfortunately that kernel configuration disables building kernel modules; we want those so that we can use pf later. Some modules fail to build — I didn't investigate exactly why, but it was something related to firmware blobs — so we turn those off explicitly.
# Create working space
WORKDIR=`env TMPDIR=\`pwd\` mktemp -d -t ERLBUILD`
We create some temporary working space under the current directory. On many systems /tmp isn't large enough to hold a complete installation of FreeBSD, so I overrode the default there.
# Build MIPS64 world and ERL kernel
JN=`sysctl -n hw.ncpu`
( cd $SRCDIR && make buildworld -j${JN} )
( cd $SRCDIR && make buildkernel -j${JN} )
Build the MIPS64 world and kernel. The -j flag tells make to run several commands in parallel; we consult sysctl to find out how many CPUs we have available for the build.
# Install into a temporary tree
mkdir ${WORKDIR}/tree
( cd $SRCDIR && make installworld distribution installkernel DESTDIR=${WORKDIR}/tree )
We create a tree and install FreeBSD into it. The installworld and installkernel targets install the userspace and kernel binaries respectively; the distribution target installs standard configuration files.
# Download packages
cp /etc/resolv.conf ${WORKDIR}/tree/etc/
pkg -c ${WORKDIR}/tree install -Fy pkg djbdns isc-dhcp43-server
rm ${WORKDIR}/tree/etc/resolv.conf
The FreeBSD project provides precompiled binary packages for the 64-bit MIPS architecture; this allows us to put packages into the image we're building while avoiding the headaches of cross-building them. However, we cannot cross-install packages either, since packages can run scripts when they are installed — scripts which (since we're not building this disk image on a MIPS64 system) we won't be able to run. Instead, we simply download the packages into the image; they will be installed when the system first boots.
# FreeBSD configuration
cat > ${WORKDIR}/tree/etc/rc.conf <<EOF
The /etc/rc.conf file is the "master configuration file" on FreeBSD; most enabling/disabling of services is done here, as well as some more specific configuration.
Every host needs a name. We'll call this "ERL", lacking any better inspiration.
We're building a disk image which we'll write onto the provided USB disk, but the image is smaller than the disk; when the system first boots, this tells it to expand the root partition to fill the available space.
This is probably unnecessary, but I like to have a memory disk mounted on /tmp; if for some reason temporary files get created here, this will avoid burning up the flash storage.
ifconfig_octe1=" netmask"
ifconfig_octe2=" netmask"
We run DHCP on the "upstream" connection, but provide static network parameters for the "LAN" connections.
We're going to use the PF firewall; and we're going to be forwarding packets (both via the network address translation and between the two LAN ports) so we need that option too.
dhcpd_ifaces="octe1 octe2"
We don't want to run sendmail; we do want to run sshd (we'll use PF to restrict access, however); we do want to run ntpd, and we want it to set its clock when it starts, no matter how far off it is (the EdgeRouter Lite doesn't have a battery-powered clock, so it boots with a wildly wrong time set); we want to run svscan so that it can launch dnscache for us; and we want to run a dhcp daemon for the two LAN interfaces.
cat > ${WORKDIR}/tree/etc/pf.conf <<EOF
# Allow anything on loopback
set skip on lo0

# Scrub all incoming traffic
scrub in

# NAT outgoing traffic
nat on octe0 inet from { octe1:network, octe2:network } to any -> (octe0:0)

# Reject anything with spoofed addresses
antispoof quick for { octe1, octe2, lo0 } inet

# Default to blocking incoming traffic but allowing outgoing traffic
block all
pass out all

# Allow LAN to access the rest of the world
pass in on { octe1, octe2 } from any to any
block in on { octe1, octe2 } from any to self

# Allow LAN to ping us                                       
pass in on { octe1, octe2 } inet proto icmp to self icmp-type echoreq

# Allow LAN to access DNS, DHCP, and NTP
pass in on { octe1, octe2 } proto udp to self port { 53, 67, 123 }
pass in on { octe1, octe2 } proto tcp to self port 53

# Allow octe2 to access SSH
pass in on octe2 proto tcp to self port 22
Fairly straightforward PF configuration: NAT outgoing traffic onto the "upstream" connection; allow the local network to access DNS, DHCP, and NTP; and allow octe2 to access SSH. I opted to only allow ICMP echo request packets from the LAN side — some people prefer to respond to pings from anywhere, but I decided that for a general purpose image it was better to err in the direction of being silent. Similarly I decided to simply drop bad packets rather than sending TCP RST or ICMP unreachable responses.
mkdir -p ${WORKDIR}/tree/usr/local/etc
cat > ${WORKDIR}/tree/usr/local/etc/dhcpd.conf <<EOF
option domain-name "localdomain";
subnet netmask {
        option routers;
        option domain-name-servers;
subnet netmask {
        option routers;
        option domain-name-servers;
This provides a basic configuration for ISC DHCPD. I have a feeling that this could be simplified to have a single configuration block covering both LAN ports.
# Script to complete setup once we're running on the right hardware
mkdir -p ${WORKDIR}/tree/usr/local/etc/rc.d
cat > ${WORKDIR}/tree/usr/local/etc/rc.d/ERL <<'EOF'
I mentioned earlier that we couldn't cross-install packages; we take care of that now, with a script which runs the first time FreeBSD boots. The quotes around EOF in the here-document syntax instruct the shell that variables should not be expanded — important since we're creating a shell script which uses several shell variables.
# KEYWORD: firstboot
The "firstboot" keyword tells /etc/rc that this script should only be run the first time that the system boots.

# This script completes the configuration of EdgeRouter Lite systems.  It
# is only included in those images, and so is enabled by default.

. /etc/rc.subr

: ${ERL_enable:="YES"}

This is fairly standard rc.d script boilerplate.

	# Packages
	env SIGNATURE_TYPE=NONE pkg add -f /var/cache/pkg/pkg-*.txz
	pkg install -Uy djbdns isc-dhcp43-server
We want to install the two packages we downloaded into the image earlier.
	# DNS setup
	pw user add dnscache -u 184 -d /nonexistent -s /usr/sbin/nologin
	pw user add dnslog -u 186 -d /nonexistent -s /usr/sbin/nologin
	mkdir /var/service
	/usr/local/bin/dnscache-conf dnscache dnslog /var/service/dnscache
	touch /var/service/dnscache/root/ip/192.168
We configure dnscache to be launched by svscan and respond to DNS requests from the LAN.
	# Create ubnt user
	echo ubnt | pw user add ubnt -m -G wheel -h 0
We could have created this user while creating the disk image, but since we needed to have a firstboot script anyway it was easier to do it here. The -h 0 option means "read the password from standard input", which is why we're echoing it in from there.
	# We need to reboot so that services will be started
	touch ${firstboot_sentinel}-reboot
Part of the rc.d "firstboot" mechanism is to allow scripts to ask for the system to be rebooted after the first boot (and all the associated system initialization) is complete. In this case, we need to reboot in order to have svscan and isc-dhcpd running (since they weren't installed yet when the boot process started).

load_rc_config $name
run_rc_command "$1"
chmod 755 ${WORKDIR}/tree/usr/local/etc/rc.d/ERL
More boilerplate. The rc.d script must be executable.
# We want to run firstboot scripts
touch ${WORKDIR}/tree/firstboot
The sentinel file /firstboot tells FreeBSD that the system is booting for the first time and "firstboot" scripts should be run. At the end of the first boot, /etc/rc deletes this file.
# Create FAT32 filesystem to hold the kernel
newfs_msdos -C 33M -F 32 -c 1 -S 512 ${WORKDIR}/FAT32.img
mddev=`mdconfig -f ${WORKDIR}/FAT32.img`
mkdir ${WORKDIR}/FAT32
mount -t msdosfs /dev/${mddev}  ${WORKDIR}/FAT32
cp ${WORKDIR}/tree/boot/kernel/kernel ${WORKDIR}/FAT32/vmlinux.64
umount /dev/${mddev}
rmdir ${WORKDIR}/FAT32
mdconfig -d -u ${mddev}
The EdgeRouter Lite boot loader expects to launch a Linux kernel which is found at /vmlinux.64 within a FAT32 filesystem. Fortunately, it doesn't check that the kernel it's launching is Linux... so we create a FAT32 filesystem and drop a FreeBSD kernel in, named "/vmlinux.64" so that the EdgeRouter Lite boot loader launches it for us. (Our kernel is only about 10 MB, but the minimum size for a FAT32 filesystem is 33 MB.)
# Create UFS filesystem
echo "/dev/da0s2a / ufs rw 1 1" > ${WORKDIR}/tree/etc/fstab
makefs -f 16384 -B big -s 920m ${WORKDIR}/UFS.img ${WORKDIR}/tree
We use the makefs tool to create a UFS filesystem from the installed FreeBSD tree. The MIPS64 hardware is big-endian, and UFS is not endian-agnostic, so we need to tell makefs to create a big-endian filesystem; this also means that (assuming we're using a little-endian system to build this disk image) we can't mount the filesystem on the system we're using to create it.
# Create complete disk image
mkimg -s mbr		\
    -p fat32:=${WORKDIR}/FAT32.img \
    -p "freebsd:-mkimg -s bsd -p freebsd-ufs:=${WORKDIR}/UFS.img" \
    -o ${IMGFILE}
The EdgeRouter Lite boot loader expects the kernel to be found on the first MBR slice; and the FreeBSD ERL kernel configuration expects the root filesystem to be found at da0s2a so we'd better put it there.
# Clean up
chflags -R noschg ${WORKDIR}
rm -r ${WORKDIR}
Once we finish building the disk image, we don't need our staging tree or the separate filesystem images any more.

A challenge to startups

"From those unto whom much has been given, much shall be expected." In various forms this sentiment has been expressed at least as far back as the third century AD, via the Gospel of Luke; more recently it has been cited frequently by US Presidents, and can be seen in modified form in such places as Spider-Man ("With great power comes great responsibility") and the demands of the Occupy movement that the "1%" pay higher taxes. I started thinking about this a few days ago after re-reading an essay by Paul Graham and thinking about how lucky I was to be running a startup company now rather than two decades ago.

In that essay, Paul remarked upon four developments which had made it cheaper than ever to launch a startup: Open Source Software, which dramatically reduced the cost of software (although as Jamie Zawinski famously pointed out, it only becomes free if your time has no value); Moore's Law, which dramatically reduced the cost of hardware; the Web, which made it possible to reach large audiences cheaply; and better programming languages (I would broaden this to software development tools generally) which made software development much faster. All of these remain very important ten years after Paul wrote that essay; while I think Paul somewhat overstated his case with the assertion that the Web made promotion free — the Web is now so packed full of better mousetraps that it's hard for the world to beat well-worn paths to every door — it's still startling to think about the effort which would be involved in marketing a social network, a hundred million item mail-order catalogue, of even a brand of mobile phones without the Web.

For all that what Paul wrote remains true, I'll add four more developments. First, starting with Y Combinator, which Paul co-founded and was in the process of getting off the ground even as he wrote that essay, there has been a dramatic expansion of the funding available to new startup companies; while I was lucky to be able to "bootstrap" my startup, the easy availability of funding helps a great many companies which would otherwise never start. Second, where startup founders previously had to figure things out largely on their own, there is now a very large amount of mentorship available — ranging from large organized programs like Y Combinator down to helpful individuals who declare that their inbox is always open. (My inbox, like Patrick's, is also always open, but I'm probably not the best person to ask for business advice, given that I'm famously bad at business.) Third, the availability of a large number of straightforward services for processing credit cards has made life dramatically simpler for startups which go the direction of having customers who pay for their products and services. It's hard to imagine, but when I started Tarsnap in 2006, Paypal was the least painful way of getting paid. And fourth, where Moore's law brought the cost of hardware down, cloud computing has reduced the capital cost of hardware to zero — which both allows for cheaper experimentation, and reduces the need for startups to find investors.

And so I have a challenge for my friends in the startup community: Seeing that so much has been given to us, think about what you can do to make things better for the next generation of startup founders.

There are no wrong answers here. Paul Graham contributed back in the form of mentorship and funding for startups; many founders of successful startups make such investments of money, but Paul Graham may be unique in combining it with such a successful mentorship program. Patrick and John Collison, having grown frustrated with the problem of receiving payments online, launched Stripe to solve that. Erin and Thomas Ptacek and Patrick McKenzie decided to fix technical hiring, which will doubtless be a great boon to future startups. My friend Terry Beech went from running a startup to teaching university courses about entrepreneurship, and then moved into Federal politics, winning a seat as a Member of Parliament in the recent election; I have no doubt that his background will allow him to make important contributions there. And my own small startup, being built heavily on Open Source Software, contributes each year an amount equal to its December operating profits to supporting Open Source Software; rather embarrassingly, this makes Tarsnap the 7th largest donor to the FreeBSD Foundation in 2015, something which I hope will change before the year end. Five examples, all entirely different; I have no doubt that there are far more with whom I am not personally familiar.

There are two weeks left in 2015, and it's a good season for thinking about such things. How will you contribute back?

The design of my magic getopt

When I started writing the blog post announcing my magic getopt, I was intending to write about some of the design decisions which went into it. I changed my mind partway through writing: My readers probably cared about the functionality, but not about the ugly implementation details. It turns out that my original plan was the right one, as I've received questions about nearly every design decision I made. Since this clearly is of interest to my readers, here's the reasons behind some of the decisions I made while writing that code.

Why getopt? Why not <insert alternative command-line processing system here>?
There are two reasons here. First, while there is a wide variety of systems for parsing command lines, most of them are limited to "set a flag" or "store a value" options. In the cases where they can be instructed to execute arbitary code when an option is found, the code is inevitably somewhere else, eliminating the crucial "all the option handling is done in one place" property. Second, and more importantly: I wanted a drop-in replacement for getopt, so that existing UNIX software can migrate easily.

So how does it work?
Each GETOPT_* produce a case statement based on the line number in the source file (this is why you can't have two such statements on the same line). When we first enter the getopt loop, we hit every line number between the line number of the GETOPT_SWITCH and the line number of the GETOPT_DEFAULT (this is why that needs to occur last) and each statement registers its line number and the option string it handles. Finally, once the initialization is completed, we process the command line, with GETOPT extracting options and GETOPT_SWITCH switching on the line number of the registered option.

Why __LINE__? Why not __COUNTER__?
Because __COUNTER__ is not standardized, and thus not portable. It would be tempting to check if __COUNTER__ is available and use it if present — this would avoid the "only one option per line" restriction — but in fact this would be a horrible idea: Code which had multiple options one the same line would compile without warnings right up until someone used a compiler which lacked __COUNTER__. Better to use __LINE__ and avoid encouraging unportable people to write unportable code.

Isn't iterating through every line quite slow?
Not at all. It takes just a few clock cycles to hit the default case statement and jump back; even with a getopt loop 1000 lines long you'd only be wasting a few microseconds. For large loops the far larger cost is to compare each encountered option against all of the registered option strings, and this is the same in every getopt_long implementation I've read. In the extremely unlikely event that options-parsing becomes a performance problem, I would recommend switching the option search to use a binary search and thus far fewer string compares.

Why <setjmp.h>?
It's possible to for a program to have multiple getopt loops; this may be desireable for programs which operate differently depending on the name with which they are invoked, for example. Unfortunately, during the initial option-registering loop, it's necessary to jump back to the top — but we can't use goto because the identifiers of labeled statements in C have function scope, and someone might want to put two getopt loops into the same function. Automatic variables in C, however, have block scope — and so I use setjmp and longjmp simply to perform a block-scope goto. (Incidentally, I considered using these in a computed-jump lookup table; but it turned out that it wouldn't accomplish anything which I wasn't already getting via the case statement.)

Why does GETOPT_MISSING_ARG: disable warnings?
Because that's how getopt works. Beyond that? Beats me.

A magic getopt

Parsing command lines in C is easy when all of the options are single characters: You pass your command line to getopt along with a string containing all the valid options; then you have a switch statement with a case for each option you want to handle. It looks something like this:
        int ch;
        while ((ch = getopt(argc, argv, ":af:")) != -1) {
                switch (ch) {
                case 'a':
                        aflag = 1;
                case 'f':
                        printf("foo: %s\n", optarg);
                case ':':
                        printf("missing argument to -%c\n", optopt);
                        /* FALLTHROUGH */
Unfortunately if you want to add support for long options — say, to accept a new --bar option — you need to switch to using getopt_long and your list of options is no longer confined to the options-processing loop:
enum options


static struct option longopts[] =
	{ "bar", required_argument, NULL, OPTION_BAR }


        int ch;
        while ((ch = getopt_long(argc, argv, ":af:", longopts, NULL)) != -1) {
                switch (ch) {
                case 'a':
                        aflag = 1;
		case OPTION_BAR:
			printf("bar: %s\n", optarg);
                case 'f':
                        printf("foo: %s\n", optarg);
                case ':':
                        printf("missing argument to -%c\n", optopt);
                        /* FALLTHROUGH */
Rather than adding a new option in one place (or two, if you count the list of options at the top of the loop as being a separate place), new long options require changes in three places — one of which (the enum) is often placed in an entirely separate file. So much for keeping code clean and free of duplication. There has got to be a better way, right?

Enter magic getopt. Via a little bit of macro magic, the above options-handling code turns into this:

        const char * ch;
        while ((ch = GETOPT(argc, argv)) != NULL) {
                GETOPT_SWITCH(ch) {
                        aflag = 1;
                        printf("bar: %s\n", optarg);
                        printf("foo: %s\n", optarg);
                        printf("missing argument to %s\n", ch);
                        /* FALLTHROUGH */
with each option listed just once, at the point where it is handled.

To use this, add getopt.c and getopt.h to your project, #include "getopt.h", and then make the following changes:

And that's it. These changes can be made to a program which accepts single-character options almost mechanically and with no increase in the source code complexity; and then new options (short or long) can be added by simply adding the new option-handling code, without needing to make changes in several different places.

I think it's time for me to start adding long options to some of my projects.

