Amazon S3 data corruption

Amazon S3 recently experienced data corruption due to a failing load balancer. While the tarsnap server currently uses S3 for back-end storage, tarsnap was not affected by this.

In reviewing how the tarsnap server uses S3 after this problem was uncovered, I realized that tarsnap could theoretically have been affected by this problem; but fortunately Amazon was able to confirm for me that none of the relevant requests went through the afflicted load balancer. Nevertheless, I will be making changes to the tarsnap server in order to detect any data corruption which occurs within S3, in the unlikely event that such a problem occurs in the future.

It's also worth noting that if any data corruption had affected tarsnap, it would not have gone completely unnoticed -- the tarsnap client cryptographically signs archives, so I couldn't give someone a corrupted archive even if I tried. Such corruption would, however, result in an archive restore failing; so there's a clear benefit to making sure that any corruption is discovered before that point (especially in a case of intermittant corruption like this, where retrying a request would probably be sufficient to have the data transmitted without error).

Overall, Amazon does a very good job with its web services; but backups, more than nearly anything else, really need to work -- so the more potential errors I can check for, the better, even if I doubt the checks will ever find anything.

Posted at 2008-06-24 22:02 | Permanent link | Comments
blog comments powered by Disqus

Recent posts

Monthly Archives

Yearly Archives