Amazon S3 Outage Hits WordPress Businesses, Disrupting Services and Support

Amazon is currently experiencing “high error rates with S3 in US-EAST-1,” causing a massive outage for sites, apps, and services across the web. The AWS service health dashboard was also temporarily affected by the outage. Amazon says it is working at repairing S3 and that they believe they have identified the root cause.

The outage is affecting many popular sites, such as Quora, Netflix, Splitwise, Business Insider, Giphy, Trello, IFTTT, many publishers’ image hosting, filesharing in Slack, and the Docker Registry Hub.

WordPress businesses are also currently affected, especially those that host customer downloads. WooCommerce customers are currently unable to access downloads they purchased. Similarly Envato customers are having difficulty accessing downloads and content.

Joost de Valk, CEO and founder of Yoast, said the company experienced minor effects from the outage but has already been planning on switching from S3 to a new storage provider.

“The outage doesn’t seem to have affected our revenue much,” de Valk said. “It was slightly annoying and led to some images not working and people not being able to download their plugins for a while, which is always a shame. However, not directly related to this, we’re already looking at ditching S3. That’s because our new hosting setup at SiteGround combined with CDN from MaxCDN actually negates the need for S3 entirely.”

Other companies that have AWS integrated into their support services experienced more disruption due to customers not being able to receive help.

“Obviously our website is hosted using AWS technology through Pagely,” WP Ninjas co-founder James Laws said. “I’m not sure how they’ve been affected directly, but we have noticed intermittent downtime. Perhaps the biggest impact is that our support service is built on AWS and with it down we are completely unable to provide any support to our users.”

Laws said the company has had fairly decent uptime with AWS in the past and that the idea of switching services because of an outage would not be worth the effort.

“The truth is that 100% uptime is more a fantasy than anything,” Laws said. “The idea of having to move a website or change a support system temporarily or even permanently for a short period of downtime would be pretty daunting. You probably could create contingency plans for something like this, but the technical and administrative costs are not generally worth it in my opinion.”

The outage serves as a painful reminder of how dependent the web is on cloud storage providers and how few services have a backup plan for instances like these.

At 12:52 PM PST Amazon released an update, promising improvements for customers within the hour: “We are seeing recovery for S3 object retrievals, listing and deletions. We continue to work on recovery for adding new objects to S3 and expect to start seeing improved error rates within the hour.” The ability to retrieve, list, and delete was fully recovered within half an hour and Amazon continues to work on fixing the ability to add new objects to S3.

6 Comments


  1. I like when huge Cloud providers go down, hilarity ensues across the interwebs with memes a plenty. Though it sucks if your business relies on it.

    Report


    1. It was so much fun to read trending tweets about AWS yesterday.

      Report


  2. Is it me or are there becoming more and more attacks or dropouts on cloud based sites? There was the debacle with fit bit and all the other day and now this one for Amazon. Is this really security issues in the cloud rather than technical downtime issues? 123-Reg was hit the other day and the messages coming from them smacked of attack rather than database issues.

    Report


    1. 123-reg hosting sucks that’s why. They corrupted and lost a huge portion of client sites/data not long ago. They didn’t have a backup and no way of really fixing it. They suggested anyone who keeps their own backups to restore them instead and they were calling in data recovery specialists.

      They get hit by DDoS quite often.

      Report


  3. We (@pagely) did get a run on tickets today about problems related to this event. In every case it was site specific – 3rd party .js script (served from s3) in header loaded as ‘blocking’ which when failed to load.. failed the page. Themes, plugins, and customer code not following best practices of loading js non-blocking.

    We do not utilize s3 in such a way that this event materially affected our operational capacity. We store backups on s3, which we were unable to do for a period of time, however we still created those backups locally and the system simply ‘retried’ the push to s3 until it succeeded.

    I’ll be the first to say – hosting ain’t easy. You make your choices and have to live with them. We are 100% AWS and today’s event did not affect us operationally. That is to say the next AWS event may – if it was an EC2 outage = bad day for us.

    What we always say though, and was proven today, if AWS fails – the whole/most of the internet fails – in that context, well ‘the Internet’ is ‘downforeveryonenotjustme’

    I rather align our services with someone (AWS) that has the most pressure to perform than a random that no one else relies on. This has proven out over the last decade.. yes AWS issues are massive and widespread, but in context of the shear scale of the operation are very rare in occourance. In other words reliability is pretty damn amazing for it’s size – when smaller, much smaller, systems cannot boast the same level of reliability.

    James at wpninja’s should be getting an email shortly from our staff – first thanking him for his continuing business, love you man, and secondly outlining the js blocking issue I described above.

    Viva WordPress!

    Report

Comments are closed.