EC2's most dangerous feature

As a FreeBSD developer — and someone who writes in C — I believe strongly in the idea of "tools, not policy". If you want to shoot yourself in the foot, I'll help you deliver the bullet to your foot as efficiently and reliably as possible. UNIX has always been built around the idea that systems administrators are better equipped to figure out what they want than the developers of the OS, and it's almost impossible to prevent foot-shooting without also limiting useful functionality. The most powerful tools are inevitably dangerous, and often the best solution is to simply ensure that they come with sufficient warning labels attached; but occasionally I see tools which not only lack important warning labels, but are also designed in a way which makes them far more dangerous than necessary. Such a case is IAM Roles for Amazon EC2.

A review for readers unfamiliar with this feature: Amazon IAM (Identity and Access Management) is a service which allows for the creation of access credentials which are limited in scope; for example, you can have keys which can read objects from Amazon S3 but cannot write any objects. IAM Roles for EC2 are a mechanism for automatically creating such credentials and distributing them to EC2 instances; you specify a policy and launch an EC2 instance with that Role attached, and magic happens making time-limited credentials available via the EC2 instance metadata. This simplifies the task of creating and distributing credentials and is very convenient; I use it in my FreeBSD AMI Builder AMI, for example. Despite being convenient, there are two rather scary problems with this feature which severely limit the situations where I'd recommend using it.

The first problem is one of configuration: The language used to specify IAM Policies is not sufficient to allow for EC2 instances to be properly limited in their powers. For example, suppose you want to allow EC2 instances to create, attach, detach, and delete Elastic Block Store volumes automatically — useful if you want to have filesystems automatically scaling up and down depending on the amount of data which they contain. The obvious way to do this is would be to "tag" the volumes belonging to an EC2 instance and provide a Role which can only act on volumes tagged to the instance where the Role was provided; while the second part of this (limiting actions to tagged volumes) seems to be possible, there is no way to require specific API call parameters on all permitted CreateVolume calls, as would be necessary to require that a tag is applied to any new volumes being created by the instance. (There also seems to be a gap in the CreateVolume API call, in that it is documented as returning a list of tags assigned to the newly created volume, but does not advertise support for assigning tags as part of the process of creating a volume; but that at least could be easily fixed, and I'm not even sure if this is a failing in the API call or in the documentation of the API call.) The difficulty, and sometimes impossibility, of writing appropriately fine-grained IAM Policies results in a proliferation of overbroad policies; for example, the pre-written Policy which Amazon provides for allowing the Simple Systems Manager agent to make the API calls it requires (AmazonEC2RoleforSSM) permits it to GET and PUT any object in S3 — a terrifying proposition if you store data in S3 which you don't want to see made available to all of your "managed" EC2 instances.

As problematic as the configuration is, a far larger problem with IAM Roles for Amazon EC2 is access control — or, to be more precise, the lack thereof. As I mentioned earlier, IAM Role credentials are exposed to EC2 instances via the EC2 instance metadata system: In other words, they're available from http://169.254.169.254/. (I presume that the "EC2ws" HTTP server which responds is running in another Xen domain on the same physical hardware, but that implementation detail is unimportant.) This makes the credentials easy for programs to obtain... unfortunately, too easy for programs to obtain. UNIX is designed as a multi-user operating system, with multiple users and groups and permission flags and often even more sophisticated ACLs — but there are very few systems which control the ability to make outgoing HTTP requests. We write software which relies on privilege separation to reduce the likelihood that a bug will result in a full system compromise; but if a process which is running as user nobody and chrooted into /var/empty is still able to fetch AWS keys which can read every one of the objects you have stored in S3, do you really have any meaningful privilege separation? To borrow a phrase from Ted Unangst, the way that IAM Roles expose credentials to EC2 instances makes them a very effective exploit mitigation mitigation technique.

To make it worse, exposing credentials — and other metadata, for that matter — via HTTP is completely unnecessary. EC2 runs on Xen, which already has a perfectly good key-value data store for conveying metadata between the host and guest instances. It would be absolutely trivial for Amazon to place EC2 metadata, including IAM credentials, into XenStore; and almost as trivial for EC2 instances to expose XenStore as a filesystem to which standard UNIX permissions could be applied, providing IAM Role credentials with the full range of access control functionality which UNIX affords to files stored on disk. Of course, there is a lot of code out there which relies on fetching EC2 instance metadata over HTTP, and trivial or not it would still take time to write code for pushing EC2 metadata into XenStore and exposing it via a filesystem inside instances; so even if someone at AWS reads this blog post and immediately says "hey, we should fix this", I'm sure we'll be stuck with the problems in IAM Roles for years to come.

So consider this a warning label: IAM Roles for EC2 may seem like a gun which you can use to efficiently and reliably shoot yourself in the foot; but in fact it's more like a gun which is difficult to aim and might be fired by someone on the other side of the room snapping his fingers. Handle with care!

Posted at 2016-10-09 08:30 | Permanent link | Comments

Daemonic Dispatches

EC2's most dangerous feature

Recent posts

Monthly Archives

Yearly Archives