Amazon’s S3 Outage: Usage spike or DDoS attack?

By iddav at 10:50 pm on February 17, 2008Comments Off on Amazon’s S3 Outage: Usage spike or DDoS attack?

Amazon’s Simple Storage Service (S3) experienced an outage on the morning of February 15th, causing inaccessible content in the thousands of websites that rely on S3 for data storage. According to Amazon’s official explanation, the outage was due to a significantly increased volume of authenticated calls from multiple users. From the security perspective, this leads to more questions than answers.

Used by websites like Twitter and SmugMug, S3 aims to provide a robust, pay-as-you-go interface for storing virtually unlimited amounts of data on the web. It is powered by Amazon’s global storage infrastructure and its design requirements includes the goal of storing data at “99.99% availability” with “no single points of failure.”

Despite the high stakes, S3 did reveal a weak link as a result of the outage: authenticated requests, due to their use of cryptography, used more resources than a typical request. The description of the cause posted in Amazon’s forums states that it was due to “elevated levels of authenticated requests from multiple users in one of our locations” at 3:30am followed by “several other users significantly increase their volume of authenticated calls,” causing the authentication service to reach capacity at around 4:00am.

So was this a coincidence or a planned attack? Amazon does not say, but they do indicate that the issue was resolved by moving more capacity online, implicitly suggesting that it was due to an increase in usage. However, especially in the case of a natural cause, this incident has exposed a large vulnerability in the authentication system: a competing service could explicitly send large amounts of authenticated calls to S3 in an attempt to overload it. Fortunately, Amazon plans to address this, stating that they will add “additional defensive measures around the authenticated calls.”

The outage puts Amazon’s suggested benefit of “no single point of failure” to question. Had the companies hosted on S3 been on separate hosts, the overall impact on consumers would have been far smaller. As the IT sector continues to see a significant number of mergers take place, this incident may prove to be a valuable warning about the robustness of our computing infrastructure. Even for a company as large and experienced as Amazon, the idea of the infallible system seems to be elusive. As the small IT firms of the dot-com era are being merged and consolidated into larger conglomerates, it would behoove the industry to keep in mind that even the largest of companies cannot guarantee 100% uptime. Thus, if a large content host goes down, the effect is not only on its own business, but that of many others. The effect is far from the ideal of distributed redundant systems. While an outage of Twitter might not be particularly significant to the lives of its users, had critical infrastructure or banking information been involved, the impact would have been far more pronounced. Next time, it might.

Filed under: Availability,Current EventsComments Off on Amazon’s S3 Outage: Usage spike or DDoS attack?

Comments are closed.