Strategies to Prevent Data Center Mishaps

Mike Allen
August 29, 2014

"Many data centers are literally fortresses: huge, seemingly impregnable buildings that don't seem like they'd admit any foreign matter, let alone a little rain water. But unfortunately that's not the case, since even the strongest-seeming center can be susceptible to problems that arise from ostensibly small, easily overlooked issues. For example, the company BDO Unibank Inc. recently made an announcement that its service had been temporarily disrupted because of rainfall that trickled into one of its data centers, according to GMA News. The problem arose after a prolonged period of heavy rainfall, which evidently wore down a part of the data center's ceiling and caused water to start trickling onto the floor. BDO had no other recourse except temporarily suspending its centers' activities to prevent a potentially calamitous merging of water and electricity. For just being a little water, the rain certainly had far-reaching repercussion for BDO.

""We apologize to our valued clients and the general public for the service interruption they may have been experiencing in our Automated Teller Machines, retail and corporate banking, mobile and phone banking, and ATM debit cards and cash cards,"" the company said in a statement.

What this service interruption points to is the potential for something seemingly innocuous like an afternoon of heavy rains to have an extremely detrimental impact on a center's functionality. Unfortunately for data center administrators, situations like these are by no means a rarity. For all the major disasters that cause data center downtime, there are many more situations that, while smaller in scale, can lead to both long-term degeneration as well as sudden outages. A small crack in the floor, a lack of monitoring of a particular HVAC unit, an overworked server: all of these things problems can metastasize in an instant and lead to major downtime. Therefore, here are some tactics to prevent such issues from arising:

Install remote monitoring equipment on machines. As scrupulous as an inspection of a data center can be, it's guaranteed to not be anywhere near on the level of a mechanism whose sole function is to observe and record activity for a data center server. As a Logic Monitor guide to mobile monitoring equipment points out, such technology can go a long way toward driving down overall operating expenses, since it basically renders nil the possibility that an outage would take data center operators by surprise. The great virtue of remote monitoring equipment is that it enables center administrators to have a constant and therefore always evolving picture of the data center's activities. Monitoring equipment ensures that a data center will operate in the proactive spirit of heeding advanced warning about things that could become an issue down the road.

Appreciate the value of regular maintenance. For all data centers out there, the time to repair a machine is not the moment it comes sputtering to an untimely death. Instead, center leaders need to ensure that maintenance on machines is carried out on a regular basis in order to stave off the kinds of maintenance-related issues that can lead to downtime, according to Titan Power. Regular maintenance may seem like a costly and time-consuming prospect for data centers, but any expenses and time associated with maintenance pales in comparison to what happens if proper maintenance isn't carried out. In that case, a center is often forced to shut down, which will leave patrons incensed and cost the data center reputational points.

Limit human error. If your data center experiences an outage, the odds are greater than not that it was caused by a human. As Forbes points out, 2010 results found that of the network outages that year, 51 percent occurred as a result of human errors. What this means is that data center leaders need to take much better steps not only to monitor their workers, but also to train them more comprehensively. As the Forbes article points out, there's not enough work being done to ensure that data center employees are as apt as possible, with IT departments instead choosing to cope with human mistakes only after they happen.

""The real problem is that the current mindset surrounding human error seems to be one of retroactive fire drills rather than proactive prevention,"" the article stated. ""Instead of reacting to the disastrous results of human error too late, IT departments should instead be pushing for the minimization, if not complete elimination, of end user-facing issues resulting from human error.""

With better oversight of employees and the possibility for human error, the large number of human-caused mistakes can be avoided

These three suggestions are far from the only steps data centers should take to ensure the smallest possibility for downtime that they can. By following these and other best practices, centers can ensure near-constant uptime and retain the trust and respect of clients."



    Mike Allen

    "Many data centers are literally fortresses: huge, seemingly impregnable buildings that don't seem like they'd admit any foreign matter, let alone a little rain water. But unfortunately that's not the case, since even the strongest-seeming center can be susceptible to problems that arise from ostensibly small, easily overlooked issues. For ...