"It's difficult to place a lot of trust in the cloud especially when two of the largest cloud providers experience global outages within months of each other. The most recent outage in February 2015 left Google Compute Engine (GCE) customers down for approximately two hours. Microsoft Azure suffered an outage that left customers down for more than five hours just three months earlier.Throughout this article we look at reasons for recent cloud outages. We examine how to evaluate public and private cloud providers and solutions based on uptime and service level agreements (SLAs). Lastly, we provide specific strategies to help avoid or minimize downtime.
GOOGLE COMPUTE ENGINE GLOBAL OUTAGE (2 HOURS OF DOWNTIME)
Google Compute Engine (GCE) allows users to run large scale workloads on virtual machines hosted on Google's cloud infrastructure and network. Customers use the service for compute, storage, networking, big data, and other cloud based services. According to Network World, Technology Business Research (TBR) estimates Google revenue from public cloud services to be around $66 million annually. Compare that to $156 million for Microsoft and $4.7 billion for Amazon Web Services. When these providers experiences outages, you can bet that there were business impacts and a few upset customers.
What was the cause? Google (GCE) experienced an outage on February 18th, 2015, just before 11 p.m. According to Google, the outage was widespread and affected all zones. In Google terminology, zones are a reference to their cloud computing regions. The cloud outage lasted approximately two hours and was later contributed to network connectivity issues. Jason Read, founder of CloudHarmony, a Laguna Beach, California based company that conducts independent, third-party monitoring of cloud vendor uptime said that it was a ""significant outage.""
Google Compute Engine - 99.95% Uptime (View Webpage)
MICROSOFT AZURE GLOBAL OUTAGE (FIVE HOURS OF DOWNTIME)
Just as Microsoft Azure starts gaining momentum against competitors, it experiences a global outage lasting approximately five hours on November 19th, 2014 after midnight. The Microsoft Azure global outage took many businesses offline and directly impacted Azure-based services such as Microsoft Office 365, Visual Studio Online, and OneDrive for Business, the Windows Store and Xbox Live.
Microsoft released a statement the morning after the outage that the company was ""investigating an issue affecting access to some Microsoft services,"" and that they were ""working to restore full access to these services as quickly as possible."" The issue has been mitigated for North Europe, but Microsoft continues to address issues impacting some virtual machine customers in West Europe, who see their VMs in a continual ""Start state"", or can't connect to their VMs, according to an update on the Azure Status website. In this case, the outage was most likely related to software, authentication, or hardware issues. However, the company has not provided much in terms of detail about the outage.
Microsoft Azure SLA - 99.9% Uptime (Visit Webpage)
Customers were extremely frustrated with the lack of communication and response from Microsoft. Some even posted comments that they would be reconsidering their cloud strategy.
PREVENT CLOUD DOWNTIME CAUSED BY OUTAGES
Is the cloud keeping you up at night? Do you have a Plan B in case you of an outage? If not, you may want to reconsider your cloud strategy and proactively plan for unexpected outages. The only way to truly reduce the risk of downtime from outages is a multi-vendor deployment approach. That means when an outage occurs, your cloud infrastructure automatically fails over to another cloud vendor. However, the key to success is to select a backup cloud vendor offers regions and network that is diverse to your primary cloud vendor."