The missing ImportVolume documentation

As a general rule, the documentation provided by Amazon Web Services is very good; in many ways, they set the standard for what documentation for public APIs should look like. Occasionally, however, important details are inexplicably absent from the documentation, and — I suspect in part due to Amazon's well known culture of secrecy — it tends to be very difficult to get those details. One such case is the EC2 ImportVolume API call.

As the maintainer of the FreeBSD/EC2 platform, I wanted to use this API call to simplify the process of building EC2 images for FreeBSD. For several years my build process has involved launching an EC2 instance, building a FreeBSD disk image directly onto an EC2 volume, and then converting that volume into an EC2 machine image ("AMI"); but in order to integrate better with the FreeBSD release process, it was essential to be able to build the disk image offline and then upload it — ideally without ever launching an EC2 instance. The ImportVolume API call is intended for exactly this purpose; but one of the mandatory parameters to this call — Image.ImportManifestUrl — came without any documentation of what data the "disk image manifest" file should contain. Without that crucial documentation, it's impossible to create a manifest file; without creating a manifest file, it's impossible to make use of the ImportVolume API; and without the ImportVolume API I could not streamline the FreeBSD/EC2 build process.

Fortunately developers inside Amazon have access to better documentation. While the ImportVolume API is not implemented in the AWS CLI, it is implemented in the — much older, and now rarely used — EC2 API Tools package. Despite the EC2 API Tools not being useful for the FreeBSD release process — being written in Java, they are far too cumbersome — they did allow me to figure out the structure of the requisite manifest file; and so I am able to present the missing ImportVolume documentation:

EC2 ImportVolume manifest file format

The EC2 ImportVolume manifest file is a standalone XML file containing a top-level <manifest> tag. This tag contains:
  • A <version> tag with the value "2010-11-15".
  • A <file-format> tag containing the image file format, as in the Image.Format parameter to ImportVolume: "RAW", "VHD", or "VMDK".
  • An <importer> tag providing information about the software used to generate the manifest file; it contains <name>, <version> and <release> tags which seem to be purely informational.
  • A <self-destruct-url> tag containing an S3 URL which is pre-signed for issuing a DELETE request on the manifest file object.
  • An <import> tag containing a <size> tag (with the size in bytes of the disk image, as in the Image.Bytes parameter to ImportVolume), a <volume-size> tag (with the size in GB of the volume to be created, as in the Volume.Size parameter to ImportVolume), and a <parts> tag with a count attribute set to the number of <part> tags which it contains.

Each <part> tag corresponds to a portion of the disk image, has an index attribute identifying the position of this part in sequence (numbered starting at 0), and contains:

  • A <byte-range> tag with start and end attributes specifying the position of the first and last bytes of this part in the disk image.
  • A <key> tag containing the S3 object name for this part. [I'm not sure what purpose this serves, and it's possible that these could just be any unique names for the parts.]
  • <head-url>, <get-url>, and <delete-url> tags containing S3 URLs which are pre-signed for issuing HEAD, GET, and DELETE requests respectively on the S3 object containing this part.

Example

The following is an example of a manifest file (with a repetitive section elided):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<manifest>
    <version>2010-11-15</version>
    <file-format>RAW</file-format>
    <importer>
        <name>ec2-upload-disk-image</name>
        <version>1.0.0</version>
        <release>2010-11-15</release>
    </importer>
    <self-destruct-url>https://import-volume-example.s3.amazonaws.com/c05aebc0-98d3-41df-bd64-53eacf4de842/disk.imgmanifest.xml?AWSAccessKeyId=07G3159HQ3Z614FJ8GR2&Expires=1417078408&Signature=LZi1Hzkq%2FfC%2BUJrxj7m1DfozTUI%3D</self-destruct-url>
    <import>
        <size>1073741824</size>
        <volume-size>1</volume-size>
        <parts count="103">
            <part index="0">
                <byte-range end="10485759" start="0"/>
                <key>c05aebc0-98d3-41df-bd64-53eacf4de842/disk.img.part0</key>
                <head-url>https://import-volume-example.s3.amazonaws.com/c05aebc0-98d3-41df-bd64-53eacf4de842/disk.img.part0?AWSAccessKeyId=07G3159HQ3Z614FJ8GR2&Expires=1417078408&Signature=%2FFHSYRWxr5F7h1yoUuT1uV4lXD0%3D</head-url>
                <get-url>https://import-volume-example.s3.amazonaws.com/c05aebc0-98d3-41df-bd64-53eacf4de842/disk.img.part0?AWSAccessKeyId=07G3159HQ3Z614FJ8GR2&Expires=1417078408&Signature=KulBDTq%2BoHeWJzXZ2iOPSjk%2FN%2BQ%3D</get-url>
                <delete-url>https://import-volume-example.s3.amazonaws.com/c05aebc0-98d3-41df-bd64-53eacf4de842/disk.img.part0?AWSAccessKeyId=07G3159HQ3Z614FJ8GR2&Expires=1417078408&Signature=NgupOVUophQxUzgkBl9t%2BngofuM%3D</delete-url>
            </part>

            .
            .
            .

            <part index="102">
                <byte-range end="1073741823" start="1069547520"/>
                <key>c05aebc0-98d3-41df-bd64-53eacf4de842/disk.img.part102</key>
                <head-url>https://import-volume-example.s3.amazonaws.com/c05aebc0-98d3-41df-bd64-53eacf4de842/disk.img.part102?AWSAccessKeyId=07G3159HQ3Z614FJ8GR2&Expires=1417078408&Signature=L7XD9KPWd1O%2B3Yt%2BLBJUzRXA4HA%3D</head-url>
                <get-url>https://import-volume-example.s3.amazonaws.com/c05aebc0-98d3-41df-bd64-53eacf4de842/disk.img.part102?AWSAccessKeyId=07G3159HQ3Z614FJ8GR2&Expires=1417078408&Signature=0Nl%2BRFguwfnRuYDS22F0noJ8BwE%3D</get-url>
                <delete-url>https://import-volume-example.s3.amazonaws.com/c05aebc0-98d3-41df-bd64-53eacf4de842/disk.img.part102?AWSAccessKeyId=07G3159HQ3Z614FJ8GR2&Expires=1417078408&Signature=S%2BuNzWlxA9%2F8P7549s4O2NbwVss%3D</delete-url>
            </part>
        </parts>
    </import>
</manifest>
One interesting thing about the metadata file is that it does not contain any information to allow the disk image parts stored on S3 to be validated; nor, for that matter, is there any mechanism in the EC2 ImportVolume API call to allow the manifest file to be validated. One assumes that this is due to a presumption that data stored in S3 is safe from any tampering; if I were designing this API, I would add a <sha256> tag into each <part> and add an Image.ImportManifestSHA256 parameter to the ImportVolume API call. In fact, these could easily be added as optional parameters, in order to provide security without compromising backwards compatibility.

Having determined the format of this file, I was then able to return to my original task — streamlining the FreeBSD EC2 image build process. To that end, I wrote a BSD EC2 image upload tool; and yesterday I finished preparing patches to integrate EC2 builds into the FreeBSD release process. While I still have to negotiate with the FreeBSD release engineering team about these — they are, naturally, far more familiar with the release process than I am, and there may be some adjustments needed for my work to fit into their process — I am confident that when the FreeBSD 10.2 release cycle starts there will be EC2 images built by the release engineering team rather than by myself.

And, of course, in the spirit of open source: My code and the formerly missing ImportVolume documentation is now available for anyone who might find either of them useful.

Posted at 2014-11-26 03:00 | Permanent link | Comments

Thoughts on Startup School

Last weekend, I attended Y Combinator's Startup School. When the event was announced, I was distinctly ambivalent about attending — in fact I had decided against attending many previous such events due to the cost (in both time and money) of travelling down to the San Francisco bay area — but everybody I asked told me that it was well worth attending (even for someone from outside the valley), so I took their advice and signed up.

Perhaps my expectations were misaligned, but I was astonished at how geek-unfriendly the event was. At conferences like BSDCan, one of the major considerations is the availability of power sockets; at Startup School, not only were the power sockets very limited — we were in a theatre with perhaps a dozen outlets around the perimeter — but when I plugged in my laptop I was firmly told by one of the ushers (yes, ushers) that I wasn't allowed to do that. For that matter, the ushers were also telling people that they weren't allowed to stand at the side or back of the room, and anyone who left was told that they wouldn't be allowed back in until the current item on the program finished. An attitude reasonable for a symphony concert, to be sure; but entirely misplaced in this context.

The talks themselves, while occasionally entertaining, were somewhat lacking in useful information. When they weren't being outright contradictory — Trust in your vision! But know when to pivot! — there was little to be learned which couldn't already be gleaned from Paul Graham's Essays: Good investors are important far beyond the money they bring; having good co-founders (or, at a minimum, not having bad co-founders) is critical; and the best way to grow a startup is to create an excellent product. There were only two moments which stuck out in my mind: First, Ron Conway's comment that while older venture capitalists are better at giving advice and performing due diligence, it is the youngest people who are best at picking promising startups; and second, the founder of Groupon announcing, to my great astonishment, that he doesn't like people who build unsustainable businesses.

More than anything else, I would categorize the talks as "inspirational" — within very deliberate scare quotes — since while I did not find the talks to be particularly inspiring, the amount of applause directed at any mention of millions or billions of dollars suggested that many attendees were impressed by the speakers' financial successes. I came away wondering if this was in fact the real purpose of Startup School: Rather than being a "one day conference where you'll hear stories and practical advice", as the website billed, perhaps it — taking place just a few days before the deadline for Y Combinator applications — was really just a recruiting program for YC.

On the other hand, maybe the talks aren't the point of attending Startup School: After all, you can watch them online. Startup School brought together about 1700 people who are interested in startup companies; there must be interesting people to talk to in such a group, yes? Indeed there were; the challenge was to find them amidst the crowd. While I did have some very good conversations (and met several Tarsnap users, including a few who, much to my amusement, did not know that I was the developer behind it), there were far more people who I would charitably describe as "business-oriented". While this is a problem I've seen in Vancouver — lots of people who want to run startups, but far fewer people who want to build products — I had always thought that the concentration of technical talent in the bay area produced a better ratio. At the end of the day, I was left longing for the post-dot-com-bubble days, when the business people had fled and only the geeks were left behind.

I don't plan on attending next year's Startup School, and I would hesitate to recommend it to any other startup founder. If you're considering launching a startup and you need some "inspiration" to push you into going ahead, then by all means attend. For that matter, if you're looking for an audience to practice your "elevator pitch" on, you could certainly do worse. But if you're already working on a startup? Your time is probably better spent staying home and writing code.

Posted at 2014-10-18 04:40 | Permanent link | Comments

The Open Source Software Engagement Award

Outside of my day job, my life revolves around three primary foci — Open Source Software, in that I am a contributor to FreeBSD and from time to time release other small projects independently; classical music, in that I play with the West Coast Symphony and am the Treasurer of the West Coast Amateur Musicians Society; and my Alma Mater, Simon Fraser University, where I am one of four alumni on the university Senate, and serve on three committees dealing with the creation and adjudication of scholarships, bursaries, and awards. While these foci are usually quite separate, I am always happy when they overlap; and so it is that I am delighted to announce the establishment, with funding from Tarsnap, of the $1000 Open Source Software Engagement Award at Simon Fraser University.

Simon Fraser University, some years ago, adopted an explicit strategy of being "engaged with the community": It is not enough, the thinking goes, for a university to be an "ivory tower" of research and learning, but instead a modern university must participate in the world around it. Such programs as the Trottier Observatory are thus not merely outreach activities which attempt to recruit future students, but rather a core part of the University's mission, by bringing science to students of all ages. Similarly, SFU now has a long list of awards (generally valued between $500 and $1000) which recognize students' non-academic activities — covering everything from serving in the student society, to helping at local hospitals, to teaching English to refugees, to running kids' sports camps. Indeed, one of the only communities which I never see mentioned is the one to which I have the strongest connection: The community of open source software developers.

To me, this seems like an obvious place to encourage extra-curricular activity: Like other forms of community service, contributions to open source software constitute a clear public good; in many cases such contributions allow students to directly exercise the skills they are developing during their education; and while it is unusual in not being geographically localized or propagated by lineal descent, there is a very distinct culture within the open source community — one which has striking similarities to the gift cultures of the indigenous populations which inhabited the area where the university is now located, in fact. Unfortunately I can do nothing to direct university funding in this direction; but since I run an online backup service which has an explicit policy of donating money to support open source software, I was able to make the funding available for this award nonetheless.

To quote the terms of reference for the award:

One award [will be granted each year] to an undergraduate student who meets the following criteria:
  • is enrolled full-time in a Bachelor's degree program;
  • is in good academic standing [GPA 2.0 or higher]; and
  • has demonstrated excellence in contributing to Open Source Software project(s) on a volunteer basis, consisting of code and/or documentation.

Preference will be given to students who have taken a leadership role within a project.

Applications must include:

  • a list of contributions to the Open Source Software project(s); and
  • a letter of reference from another project member describing the project and the applicant's contributions.
Unlike Google's Summer of Code, this isn't an award which pays a student to work on open source software; rather, it is "free money" to recognize the contributions a student has already made.

A few notes about this: First, as a developer I know the importance of good documentation — and the fact that it is often overlooked — so I asked for it to be explicitly included as a accepted form of contribution. Second, I know that trying to lead volunteers is similar to trying to herd cats; but I also know that having people step into (or sometimes fall into) leadership positions is essential for the smooth progress of open source software projects, so I wanted to recognize those less quantifiable contributions. Third, because this award will be adjudicated by a committee which is not very familiar with open source software (or software generally, for that matter), the letters of reference are absolutely crucial. While requiring a letter from another project member does rule out single-person projects, I don't particularly mind this: I'd rather give money to a student who works with other developers than a student who just writes code by his or her self anyway. And finally, because this is an award rather than a scholarship or bursary, it is disbursed entirely based on the above terms — there is no need for a high GPA (as with scholarships) or financial need (as with bursaries).

This award should be disbursed for the first time in the Spring 2015 term, and the deadline for applications is January 16th — although given the need for a letter of reference, I would encourage students to apply well before the deadline. In future academic years this will be awarded in the Fall term.

If you are an SFU student who contributes to open source software, please apply!

Posted at 2014-09-21 23:30 | Permanent link | Comments

Zeroing buffers is insufficient

On Thursday I wrote about the problem of zeroing buffers in an attempt to ensure that sensitive data (e.g., cryptographic keys) which is no longer wanted will not be left behind. I thought I had found a method which was guaranteed to work even with the most vexatiously optimizing C99 compiler, but it turns out that even that method wasn't guaranteed to work. That said, with a combination of tricks, it is certainly possible to make most optimizing compilers zero buffers, simply because they're not smart enough to figure out that they're not required to do so — and some day, when C11 compilers become widespread, the memset_s function will make this easy.

There's just one catch: We've been solving the wrong problem.

With a bit of care and a cooperative compiler, we can zero a buffer — but that's not what we need. What we need to do is zero every location where sensitive data might be stored. Remember, the whole reason we had sensitive information in memory in the first place was so that we could use it; and that usage almost certainly resulted in sensitive data being copied onto the stack and into registers.

Now, some parts of the stack are easy to zero (assuming a cooperative compiler): The parts which contain objects which we have declared explicitly. Sensitive data may be stored in other places on the stack, however: Compilers are free to make copies of data, rearranging it for faster access. One of the worst culprits in this regard is GCC: Because its register allocator does not apply any backpressure to the common subexpression elimination routines, GCC can decide to load values from memory into "registers", only to end up spilling those values onto the stack when it discovers that it does not have enough physical registers (this is one of the reasons why gcc -O3 sometimes produces slower code than gcc -O2). Even without register allocation bugs, however, all compilers will store temporary values on the stack from time to time, and there is no legal way to sanitize these from within C. (I know that at least one developer, when confronted by this problem, decided to sanitize his stack by zeroing until he triggered a page fault — but that is an extreme solution, and is both non-portable and very clear C "undefined behaviour".)

One might expect that the situation with sensitive data left behind in registers is less problematic, since registers are liable to be reused more quickly; but in fact this can be even worse. Consider the "XMM" registers on the x86 architecture: They will only be used by the SSE family of instructions, which is not widely used in most applications — so once a value is stored in one of those registers, it may remain there for a long time. One of the rare instances those registers are used by cryptographic code, however, is for AES computations, using the "AESNI" instruction set.

It gets worse. Nearly every AES implementation using AESNI will leave two values in registers: The final block of output, and the final round key. The final block of output isn't a problem for encryption operations — it is ciphertext, which we can assume has leaked anyway — but for encryption an AES-128 key can be computed from the final round key, and for decryption the final round key is the AES-128 key. (For AES-192 and AES-256 revealing the final round key provides 128 bits of key entropy.) I am absolutely certain that there is software out there which inadvertantly keeps an AES key sitting in an XMM register long after it has been wiped from memory. As with "anonymous" temporary space allocated on the stack, there is no way to sanitize the complete CPU register set from within portable C code — which should probably come as no surprise, since C, being designed to be a portable language, is deliberately agnostic about the registers and even the instruction set of the target machine.

Let me say that again: It is impossible to safely implement any cryptosystem providing forward secrecy in C.

If compiler authors care about security, we need a new C language extension. After discussions with developers — of both cryptographic code and compilers — over the past couple of years I propose that a function attribute be added with the following meaning:

"This function handles sensitive information, and the compiler must ensure that upon return all system state which has been used implicitly by the function has been sanitized."
While I am not a compiler developer, I don't think this is an entirely unreasonable feature request: Ensuring that registers are sanitized can be done via existing support for calling conventions by declaring that every register is callee-save, and sanitizing the stack should be easy given that that compiler knows precisely how much space it has allocated.

With such a feature added to the C language, it will finally be possible — in combination with memset_s from C11 — to write code which obtains cryptographic keys, uses them without leaking them into other parts of the system state, and then wipes them from memory so that a future system compromise can't reveal the keys. People talk a lot about forward secrecy; it's time to do something about it.

But until we get that language extension, all we can do is hope that we're lucky and our leaked state gets overwritten before it's too late. That, and perhaps avoid using AESNI instructions for AES-128.

Posted at 2014-09-06 08:10 | Permanent link | Comments

Erratum

In my blog post yesterday concerning zeroing arrays without interference from compiler optimization I incorrectly claimed that the following code was guaranteed to zero an array on any conforming C compiler:
static void * (* const volatile memset_ptr)(void *, int, size_t) = memset;

static void
secure_memzero(void * p, size_t len)
{

        (memset_ptr)(p, 0, len);
}

void
dosomethingsensitive(void)
{
        uint8_t key[32];

        ...

        /* Zero sensitive information. */
        secure_memzero(key, sizeof(key));
}

While I was correct in stating that the compiler is required to access memset_ptr and is forbidden from assuming that it will not change to point at some other function, I was wrong to conclude that these meant that the compiler could not avoid zeroing the buffer: The requirement to access the memset_ptr function pointer does not equate to a requirement to make a call via that pointer. As "davidtgoldblatt" pointed out on Hacker News, a compiler could opt to load memset_ptr into a register, compare it to memset, and only make the function call if they are unequal, since a call to memset in that place is known to have no observable effect.

In light of this and other observations, I do not believe that there is any way to force a C99 compiler (i.e., one which conforms to the standard but is otherwise free to act as perversely as it wishes) to generate code to zero a specified non-volatile object.

Posted at 2014-09-05 23:05 | Permanent link | Comments

Recent posts

Monthly Archives

Yearly Archives


RSS