A plan for open source software maintainersI've been writing open source software for about 15 years now; while I'm still wet behind the ears compared to FreeBSD greybeards like Kirk McKusick and Poul-Henning Kamp, I've been around for long enough to start noticing some patterns. In particular:
- Free software is expensive. Software is expensive to begin with; but good quality open source software tends to be written by people who are recognized as experts in their fields (partly thanks to that very software) and can demand commensurate salaries.
- While that expensive developer time is donated (either by the developers themselves or by their employers), this influences what their time is used for: Individual developers like doing things which are fun or high-status, while companies usually pay developers to work specifically on the features those companies need. Maintaining existing code is important, but it is neither fun nor high-status; and it tends to get underweighted by companies as well, since maintenance is inherently unlikely to be the most urgent issue at any given time.
- Open source software is largely a "throw code over the fence and walk away" exercise. Over the past 15 years I've written freebsd-update, bsdiff, portsnap, scrypt, spiped, and kivaloo, and done a lot of work on the FreeBSD/EC2 platform. Of these, I know bsdiff and scrypt are very widely used and I suspect that kivaloo is not; but beyond that I have very little knowledge of how widely or where my work is being used. Anecdotally it seems that other developers are in similar positions: At conferences I've heard variations on "you're using my code? Wow, that's awesome; I had no idea" many times.
- I have even less knowledge of what people are doing with my work or what problems or limitations they're running into. Occasionally I get bug reports or feature requests; but I know I only hear from a very small proportion of the users of my work. I have a long list of feature ideas which are sitting in limbo simply because I don't know if anyone would ever use them — I suspect the answer is yes, but I'm not going to spend time implementing these until I have some confirmation of that.
- A lot of mid-size companies would like to be able to pay for support for the software they're using, but can't find anyone to provide it. For larger companies, it's often easier — they can simply hire the author of the software (and many developers who do ongoing maintenance work on open source software were in fact hired for this sort of "in-house expertise" role) — but there's very little available for a company which needs a few minutes per month of expertise. In many cases, the best support they can find is sending an email to the developer of the software they're using and not paying anything at all — we've all received "can you help me figure out how to use this" emails, and most of us are happy to help when we have time — but relying on developer generosity is not a good long-term solution.
- Every few months, I receive email from people asking if there's any way for them to support my open source software contributions. (Usually I encourage them to donate to the FreeBSD Foundation.) Conversely, there are developers whose work I would like to support (e.g., people working on FreeBSD wifi and video drivers), but there isn't any straightforward way to do this. Patreon has demonstrated that there are a lot of people willing to pay to support what they see as worthwhile work, even if they don't get anything directly in exchange for their patronage.
It seems to me that this is a case where problems are in fact solutions to other problems. To wit:
- Users of open source software want to be able to get help with their use cases; developers of open source software want to know how people are using their code.
- Users of open source software want to support the the work they use; developers of open source software want to know which projects users care about.
- Users of open source software want specific improvements; developers of open source software may be interested in making those specific changes, but don't want to spend the time until they know someone would use them.
- Users of open source software have money; developers of open source software get day jobs writing other code because nobody is paying them to maintain their open source software.
I'd like to see this situation get fixed. As I envision it, a solution would look something like a cross between Patreon and Bugzilla: Users would be able sign up to "support" projects of their choosing, with a number of dollars per month (possibly arbitrary amounts, possibly specified tiers; maybe including $0/month), and would be able to open issues. These could be private (e.g., for "technical support" requests) or public (e.g., for bugs and feature requests); users would be able to indicate their interest in public issues created by other users. Developers would get to see the open issues, along with a nominal "value" computed based on allocating the incoming dollars of "support contracts" across the issues each user has expressed an interest in, allowing them to focus on issues with higher impact.
I have three questions for my readers:
- If this existed, would you — an open source software developer — sign up and use it?
- If this existed, would you — a user of open source software — be likely to pay to support, and receive support from, an open source software developer?
- Does anyone want to build this?
Pending the answers to the first two questions, I'm enthusiastic about this idea; if I wasn't already running Tarsnap I would probably go ahead and build this myself. But I'm not Jeff Bezos and there's no way I could run two startup companies at once; so finding someone else interested in building this would be crucial.
I think something like this could fill an important gap in the world of open source software. Maybe I'm crazy. Feedback requested.
Q: Are you talking about bounties for open source software development?
There are websites for doing that.
A: No! This would be different from bounties in several ways:
- Sponsors would be supporting a specific developer, not setting aside money for whoever comes along and claims it. This is important because bounties tend to result in horrible "drive-by coding" which causes problems for maintainers later.
- This would be about providing ongoing funding for a developer, which would allow them to spend time on long-term code maintenance, not just chasing after the latest in-demand feature.
- While there would be a "dollar value" attached to issues, that value would just be informational — telling the developer how much users care about that particular issue. The developer would get money from their monthly subscribers regardless of which issues they address. (Of course, if they don't respond to issues they may find that supporters cancel their ongoing funding.)
Cheating on a string theory examAnd now for something completely different: I've been enjoying FiveThirtyEight's "The Riddler" puzzles. A couple weeks ago I submitted a puzzle of my own; but I haven't heard back and it's too cute a puzzle to not share, so I've decided to post it here.
You have to take a 90-minute string theory exam consisting of 23 true-false questions, but unfortunately you know absolutely nothing about the subject. You have a friend who will be writing the exam at the same time as you, is able to answer all of the questions in a fraction of the allotted time, and is willing to help you cheat — but the proctors are alert and will throw you out if they suspect him of communicating any information to you.
You and your friend have watches which are synchronized to the second, and the proctors are used to him often finishing exams quickly and won't be suspicious if he leaves early. What is the largest value N such that you can guarantee that you answer at least N out of the 23 questions correctly?
Extra credit: Explain the connection between your solution and the topic of the exam.
For clarity: No, you cannot communicate information based on which hand your ambidexterous friend uses to write his exam answers, based on how quickly he walks on his way out of the exam, or any other channels not stated: The proctors will catch you and you'll get a zero on the exam. The only information you can receive is a single value: When your friend left the exam.
IPv6 on FreeBSD/EC2A few hours ago Amazon announced that they had rolled out IPv6 support in EC2 to 15 regions — everywhere except the Beijing region, apparently. This seems as good a time as any to write about using IPv6 in EC2 on FreeBSD instances.
First, the good news: Future FreeBSD releases will support IPv6 "out of the box" on EC2. I committed changes to HEAD last week, and merged them to the stable/11 branch moments ago, to have FreeBSD automatically use whatever IPv6 addresses EC2 makes available to it.
Next, the annoying news: To get IPv6 support in EC2 from existing FreeBSD releases (10.3, 11.0) you'll need to run a few simple commands. I consider this unfortunate but inevitable: While Amazon has been unusually helpful recently, there's nothing they could have done to get support for their IPv6 networking configuration into FreeBSD a year before they launched it.
To enable IPv6 support in an existing FreeBSD EC2 instance, you'll need to do three things:
Install the net/dual-dhclient port:
# pkg install dual-dhclient
Add accept_rtadv to the appropriate ifconfig line in your
/etc/rc.conf file, e.g.,
Add two more lines to your /etc/rc.conf file:
If you want to launch a new FreeBSD/EC2 instance with IPv6 support, the following configinit script can be provided as the user-data upon instance launch:
>>/etc/rc.conf firstboot_pkgs_list="dual-dhclient awscli" ifconfig_DEFAULT="SYNCDHCP accept_rtadv" ipv6_activate_all_interfaces="YES" dhclient_program="/usr/local/sbin/dual-dhclient"This tells configinit to add four lines to /etc/rc.conf, and the firstboot-pkgs tool will then install the dual-dhclient package as part of the initial system boot.
Third, the bad news: Enabling IPv6 support in EC2 is an absurdly lengthy process (and this is true regardless of what operating system you're running in EC2). You'll need to:
- Add an IPv6 address range to your VPC. (In the AWS Management Console: VPC -> Your VPCs -> right click on a VPC -> edit CIDRs -> Add IPv6 CIDR.) Most EC2 users will need to do this once for each EC2 region they're using.
- Add an IPv6 address range to each subnet. (VPC -> Subnets -> right click, Edit IPv6 CIDRs -> Add IPv6 CIDR.) Most EC2 users will need to do this once for each EC2 availability zone they're using.
- (Not necessary, but probably a good idea:) Enable auto-assignment of IPv6 addresses. (VPC -> Subnets -> right click on a subnet -> Modify auto-assign IP settings -> Enable auto-assign IPv6 address.) Again, once per availability zone; if you don't do this, you'll need to explicitly ask for an IPv6 address to be assigned for each new EC2 instance.
- (Not necessary, and probably not a good idea:) Create an Egress Only Internet Gateway. This is Amazon's attempt to reproduce the "you can't get there from here" semantics of IPv4 NAT networking in IPv6; rather than relying on this sort of broken network configuration, I'd recommend restricting access to your instances using EC2 Security Groups.
- Specify default routes for IPv6. (VPC -> Route Tables -> select a route table -> Routes tab -> Edit -> Add another route; add a destination of "::/0", select your Internet Gateway or Egress Only Internet Gateway, and click Save.) Most EC2 users will need to do this once for each EC2 region they're using.
- Add IPv6 to your Security Groups. (EC2 -> Network & Security -> Security Groups -> right click on the Security Group -> Edit inbound rules -> make your changes, noting that the "Anywhere" source now includes the IPv6 wildcard "::/0".)
- Remember that if you launch EC2 instances via the console, the new "launch-wizard-N" groups created by default will probably not include IPv6.
Finally, one important caveat: While EC2 is clearly the most important place to have IPv6 support, and one which many of us have been waiting a long time to get, this is not the only service where IPv6 support is important. Of particular concern to me, Application Load Balancer support for IPv6 is still missing in many regions, and Elastic Load Balancers in VPC don't support IPv6 at all — which matters to those of us who run non-HTTP services. Make sure that IPv6 support has been rolled out for all the services you need before you start migrating.
A very valuable vulnerabilityWhile I very firmly wear a white hat, it is useful to be able to consider things from the perspective of the bad guys, in order to assess the likelihood of a vulnerability being exploited and its potential impact. For the subset of bad guys who exploit security vulnerabilities for profit — as opposed to selling them to spy agencies, for example — I imagine that there are some criteria which would tend to make a vulnerability more valuable:
- the vulnerability can be exploited remotely, over the internet;
- the attack cannot be blocked by firewalls;
- the attack can be carried out without any account credentials on the system being attacked;
- the attack yields money (as opposed to say, credit card details which need to be separately monetized);
- once successfully exploited, there is no way for a victim to reverse or mitigate the damage; and
- the attack can be performed without writing a single line of code.
The vulnerability — which has since been fixed, or else I would not be writing about it publicly — was in Stripe's bitcoin payment functionality. Some background for readers not familiar with this: Stripe provides payment processing services, originally for credit cards but now also supporting ACH, Apple Pay, Alipay, and Bitcoin, and was designed to be the payment platform which developers would want to use; in very much the way that Amazon fixed the computing infrastructure problem with S3 and EC2 by presenting storage and compute functionality via simple APIs, Stripe fixed the "getting money from customers online" problem. I use Stripe at my startup, Tarsnap, and was in fact the first user of Stripe's support for Bitcoin payments: Tarsnap has an unusually geeky and privacy-conscious user base, so this functionality was quite popular among Tarsnap users.
Despite being eager to accept Bitcoin payments, I don't want to actually handle bitcoins; Tarsnap's services are priced in US dollars, and that's what I ultimately want to receive. Stripe abstracts this away for me: I tell Stripe that I want $X, and it tells me how many bitcoins my customer should send and to what address; when the bitcoin turns up, I get the US dollars I asked for. Naturally, since the exchange rate between dollars and bitcoins fluctuates, Stripe can't guarantee the exchange rate forever; instead, they guarantee the rate for 10 minutes (presumably they figured out that the exchange rate volatility is low enough that they won't lose much money over the course of 10 minutes). If the "bitcoin receiver" isn't filled within 10 minutes, incoming coins are converted at the current exchange rate.
For a variety of reasons, it is sometimes necessary to refund bitcoin transactions: For example, a customer cancelling their order; accidentally sending in the wrong number of bitcoins; or even sending in the correct number of bitcoins, but not within the requisite time window, resulting in their value being lower than necessary. Consequently, Stripe allows for bitcoin transactions to be refunded — with the caveat that, for obvious reasons, Stripe refunds the same value of bitcoins, not the same number of bitcoins. (This is analogous to currency exchange issues with credit cards — if you use a Canadian dollar credit card to buy something in US dollars and then get a refund later, the equal USD amount will typically not translate to an equal number of CAD refunded to your credit card.)
The vulnerability lay in the exchange rate handling. As I mentioned above, Stripe guarantees an exchange rate for 10 minutes; if the requisite number of bitcoins arrive within that window, the exchange rate is locked in. So far so good; but what Stripe did not intend was that the exchange rate was locked in permanently — and applied to any future bitcoins sent to the same address.
This made a very simple attack possible:
- Pay for something using bitcoin.
- Wait until the price of bitcoin drops.
- Send more bitcoins to the address used for the initial payment.
- Ask for a refund of the excess bitcoin.
Needless to say, I reported this to Stripe immediately. Fortunately, their website includes a GPG key and advertises a vulnerability disclosure reward (aka. bug bounty) program; these are two things I recommend that every company does, because they advertise that you take security seriously and help to ensure that when people stumble across vulnerabilities they'll let you know. (As it happens, I had Stripe security's public GPG key already and like them enough that I would have taken the time to report this even without a bounty; but it's important to maximize the odds of receiving vulnerability reports.) Since it was late on a Friday afternoon and I was concerned about how easily this could be exploited, I also hopped onto Stripe's IRC channel to ask one of the Stripe employees there to relay a message to their security team: "Check your email before you go home!"
Stripe's handling of this issue was exemplary. They responded promptly to confirm that they had received my report and reproduced the issue locally; and a few days later followed up to let me know that they had tracked down the code responsible for this misbehaviour and that it had been fixed. They also awarded me a bug bounty — one significantly in excess of the $500 they advertise, too.
As I remarked six years ago, Isaac Asimov's remark that in science "Eureka!" is less exciting than "That's funny..." applies equally to security vulnerabilities. I didn't notice this issue because I was looking for ways to exploit bitcoin exchange rates; I noticed it because a Tarsnap customer accidentally sent bitcoins to an old address and the number of coins he got back when I clicked "refund" was significantly less than what he had sent in. (Stripe has corrected this "anti-exploitation" of the vulnerability.) It's important to keep your eyes open; and it's important to encourage your customers to keep their eyes open, which is the largest advantage of bug bounty programs — and why Tarsnap's bug bounty program offers rewards for all bugs, not just those which turn out to be vulnerabilities.
And if you have code which handles fluctuating exchange rates... now might be a good time to double-check that you're always using the right exchange rates.
EC2's most dangerous featureAs a FreeBSD developer — and someone who writes in C — I believe strongly in the idea of "tools, not policy". If you want to shoot yourself in the foot, I'll help you deliver the bullet to your foot as efficiently and reliably as possible. UNIX has always been built around the idea that systems administrators are better equipped to figure out what they want than the developers of the OS, and it's almost impossible to prevent foot-shooting without also limiting useful functionality. The most powerful tools are inevitably dangerous, and often the best solution is to simply ensure that they come with sufficient warning labels attached; but occasionally I see tools which not only lack important warning labels, but are also designed in a way which makes them far more dangerous than necessary. Such a case is IAM Roles for Amazon EC2.
A review for readers unfamiliar with this feature: Amazon IAM (Identity and Access Management) is a service which allows for the creation of access credentials which are limited in scope; for example, you can have keys which can read objects from Amazon S3 but cannot write any objects. IAM Roles for EC2 are a mechanism for automatically creating such credentials and distributing them to EC2 instances; you specify a policy and launch an EC2 instance with that Role attached, and magic happens making time-limited credentials available via the EC2 instance metadata. This simplifies the task of creating and distributing credentials and is very convenient; I use it in my FreeBSD AMI Builder AMI, for example. Despite being convenient, there are two rather scary problems with this feature which severely limit the situations where I'd recommend using it.
The first problem is one of configuration: The language used to specify IAM Policies is not sufficient to allow for EC2 instances to be properly limited in their powers. For example, suppose you want to allow EC2 instances to create, attach, detach, and delete Elastic Block Store volumes automatically — useful if you want to have filesystems automatically scaling up and down depending on the amount of data which they contain. The obvious way to do this is would be to "tag" the volumes belonging to an EC2 instance and provide a Role which can only act on volumes tagged to the instance where the Role was provided; while the second part of this (limiting actions to tagged volumes) seems to be possible, there is no way to require specific API call parameters on all permitted CreateVolume calls, as would be necessary to require that a tag is applied to any new volumes being created by the instance. (There also seems to be a gap in the CreateVolume API call, in that it is documented as returning a list of tags assigned to the newly created volume, but does not advertise support for assigning tags as part of the process of creating a volume; but that at least could be easily fixed, and I'm not even sure if this is a failing in the API call or in the documentation of the API call.) The difficulty, and sometimes impossibility, of writing appropriately fine-grained IAM Policies results in a proliferation of overbroad policies; for example, the pre-written Policy which Amazon provides for allowing the Simple Systems Manager agent to make the API calls it requires (AmazonEC2RoleforSSM) permits it to GET and PUT any object in S3 — a terrifying proposition if you store data in S3 which you don't want to see made available to all of your "managed" EC2 instances.
As problematic as the configuration is, a far larger problem with IAM Roles for Amazon EC2 is access control — or, to be more precise, the lack thereof. As I mentioned earlier, IAM Role credentials are exposed to EC2 instances via the EC2 instance metadata system: In other words, they're available from http://169.254.169.254/. (I presume that the "EC2ws" HTTP server which responds is running in another Xen domain on the same physical hardware, but that implementation detail is unimportant.) This makes the credentials easy for programs to obtain... unfortunately, too easy for programs to obtain. UNIX is designed as a multi-user operating system, with multiple users and groups and permission flags and often even more sophisticated ACLs — but there are very few systems which control the ability to make outgoing HTTP requests. We write software which relies on privilege separation to reduce the likelihood that a bug will result in a full system compromise; but if a process which is running as user nobody and chrooted into /var/empty is still able to fetch AWS keys which can read every one of the objects you have stored in S3, do you really have any meaningful privilege separation? To borrow a phrase from Ted Unangst, the way that IAM Roles expose credentials to EC2 instances makes them a very effective exploit mitigation mitigation technique.
To make it worse, exposing credentials — and other metadata, for that matter — via HTTP is completely unnecessary. EC2 runs on Xen, which already has a perfectly good key-value data store for conveying metadata between the host and guest instances. It would be absolutely trivial for Amazon to place EC2 metadata, including IAM credentials, into XenStore; and almost as trivial for EC2 instances to expose XenStore as a filesystem to which standard UNIX permissions could be applied, providing IAM Role credentials with the full range of access control functionality which UNIX affords to files stored on disk. Of course, there is a lot of code out there which relies on fetching EC2 instance metadata over HTTP, and trivial or not it would still take time to write code for pushing EC2 metadata into XenStore and exposing it via a filesystem inside instances; so even if someone at AWS reads this blog post and immediately says "hey, we should fix this", I'm sure we'll be stuck with the problems in IAM Roles for years to come.
So consider this a warning label: IAM Roles for EC2 may seem like a gun which you can use to efficiently and reliably shoot yourself in the foot; but in fact it's more like a gun which is difficult to aim and might be fired by someone on the other side of the room snapping his fingers. Handle with care!