Biggest Data Center Disasters - And How Similar Situations Can Be Avoided
By Mike Allen
Posted On September 02, 2014
"For all data centers out there, the need to maintain constant uptime is of paramount importance. These days, a massive part of daily enterprise functions revolves around saving data. Whether it's the massive email provider ensuring its clients' servers are always up and running or the phone company that needs to save its users' records, data and the retention of information is at play in organizations transcending industrial sectors. Thus, data centers play a pivotal role in business continuity. Companies often look to data centers as beacons of reliability: large fortresses that can withstand any problem, and will run all the time. But that is not always the case. In this post we will look at some of the largest-scale data center disasters of all time. What an examination of these incidents reveals is the need for data centers to institute certain standards to prevent perilous situations that can jeopardize client information.
Vodafone Data Center (Istanbul, Turkey) - September 2009
The video of the incident is like something out of a disaster movie: Employees frantically scatter about the center's main room as water flows freely through the halls. Gradually, more and more water accumulates, until within the span of about four minutes half the room is submerged. Desks, chairs and computer equipment float freely as everyone evacuates the building. Eventually the entire room is underwater. This situation occurred as the result of a heavy rainstorm in Istanbul on September 9, 2009.
The damage done to the data center was hardly the worst of the impact caused by the storm: According to a Turkish newspaper, the torrents of rain caused the deaths of 23 people. Amid the peril and tragedy of that day, the Vodafone data center was left in a dire situation. Not only did the center have to deal with the destruction of its equipment, but it also had to cope with customer outreach stemming from the outage it caused its patrons. Fortunately, Vodaphone had a strong disaster recovery strategy in place, and was able torestore service to its customers within a day, according to Nokia. But it's possible that the entire situation could have been stopped in its tracks.
How the situation could have been avoided: The video illustrates just how quickly the entire center space is flooded - in fact, the whole thing transpires in less than eight minutes. What this suggests is that the center's drainage system was far from optimal. Sure, the torrents of rain were bound to pose an issue, but the extent of damage the rain caused could likely have been mitigated with a better system of dealing with outside water. According to Plumbing Engineer, there are several steps data centers can take to ensure the most robust regulation of potential water leakage possible. These include the simple installation of floor drains ""wherever pressurized water exists."" In areas that are prone to massive rainfall, this likely means that floor drains should be installed everywhere, since heavy rainfall poses a threat to every square inch of data center space. But Plumbing Engineer also pointed out that the installation of floor drains alone likely won't be enough, particularly in the event of a major water leakage incident. Therefore, centers should look to add backwater valves that help to deal with any residual water that may be present, thereby allowing new water being drained to not back up.
Samsung SDS (Gwacheon, South Korea) - April 2014
In South Korea, a Samsung data center was hit by a natural element just as damaging as water: fire. In mid-April 2014, many Samsung customers attempted to use their phones and other mobile devices only to see that an error message was being displayed, according to Engadget. But the error that customers dealt with paled in comparison to its root cause: a major fire at Samsung's Gwacheon facility. The video from the fire shows that the facility was engulfed toward the top of the building. The video shows that only the left side of the building was impacted by the fire. What surprised many media sources about the blaze was that a fire at a single Samsung building had such far-reaching consequences. In fact, reports indicated that Samsung's entire website went down temporarily in the wake of the fire, not the mention the many customers who discovered their phones weren't working. How was it that a single fire had such broad consequences?
How the situation could have been avoided: Samsung customers had to deal with the outage for several hours before having their service restored. While this may seem like a relatively brief amount of time, for mobile users it can feel like an eternity. As TechNewsWorld pointed out, the downtime for Samsung users understandably led to patrons becoming skeptical of the company. Particularly concerning was the unfortunate and truly inexcusable news that Samsung had not backed up particular servers. Industry expert Jim McGregor told TechNewsWorld that such a lack of preparation on Samsung's part was particularly shocking given how well-established of an enterprise it is.
""I have seen situations where the service was new or the company was growing so fast that new customers and services were brought online without the necessary level of redundancy just because of time and resource limitations,"" McGregor said. ""However, there really is no excuse for this to occur.""
Therefore, the first step to minimizing this type of situation would have been for Samsung to ensure comprehensive backups of all the data on its servers. Because retaining data is one of the main purposes of a data center, the lack of across-the-board backups is truly something that should never happen. However, it is possible for data center fires to never happen in the first place. Here are some preventive measures for ensuring a fire doesn't hit your data center:
Having a fire safety policy in place: All employees at a data center must know exactly what to do in case of a blaze, according to Schneider Electric. To that end, data center administrators need to institute an across-the-board center preparedness plan for fires. In addition, center leaders should outline to their employees the best practices when it comes to preventing fires. These include bringing new equipment into special breakdown rooms. With this step, communication is key. From top to bottom, ever employee should know the fire safety code by heart, said data center expert Robert Glavan.
""Every procedure should have a fire safety factor,"" he said. ""If there is no procedure to do something in the data center, then the procedure is not to proceed.""
Conducting regular inspections: Equipment naturally degenerates over time, and if it's not attended to, that can become a much bigger problem. By having a data center policy in place that guarantees consistent evaluations of the equipment in the building, you can significantly minimize the risk of a fire starting because a piece of machinery malfunctioned. Some data centers shirk their responsibility to inspect equipment because of the time and cost involved, but you can guarantee that significantly more man-hours and expenses will have to be used if such evaluations are not conducted. These inspections should also be accompanied by mobile monitoring equipment placed on the machinery in your center."
"For all data centers out there, the need to maintain constant uptime is of paramount importance. These days, a massive part of daily enterprise functions revolves around saving data. Whether it's the massive email provider ensuring its clients' servers are always up and running or the phone company that needs ...