Redundancy is all about delivering the highest levels of reliability. In the previous article, you may recall simple definitions and descriptions of data center redundancy levels and design. We examined the differences between N, N+1, 2N, and 2N+1. We briefly discussed Tier Level and redundancy standards. In this article, we will be looking at differences in reliability and failure rates for redundancy standards. Is 2N better than N+1 or 2 (N+1) in terms of failure rates for UPS configuration? We will also look at critical power design, perhaps the most important aspect of redundancy.
What is Data Center Redundancy?
Let’s start with a recap of definitions and terms. What is redundancy? It can be defined as “In engineering, the duplication of critical components or functions of a system with the intention of increasing reliability of the system, usually in the case of a backup or fail-safe.” Read more about this definition of redundancy. What are the different forms of redundancy? Several different forms of redundancy exist including power, cooling, network, hardware, software, storage, and information. For this article, we will focus on redundant systems having to do with data centers specifically.
What are Redundancy Levels? N, N+1, N+2, 2N, 2N+1, 2N+2, 3N/2
Redundancy can be broken down into several different levels. A quick recap of redundancy levels includes key terminology such as N, N+1, N+2, 2N, 2N+1, 2N+2, 3N/2. What do these terms mean?
What is N Redundancy?
N is simply the amount required for operation. It represents the capacity that you need to operate. There is no backup system so if the system fails, it fails and downtime is the ultimate result. If you have a flat tire and no spare, you have N.
What is N+1, N+2, N+X Redundancy?
N+1 represents the amount required for operation plus a backup. It ensures system availability even in the event of a component failure. It is similar to the concept of a spare time on your car. When you get a flat, you have the ability to swap out the flat with a spare tire. This simply means that you could survive one flat tire.
N+2 is the amount required for operation plus two backups. In this case, you would need two spare tires. You have the four tires for operation. You get two flats and luckily you have two spare tires.
What about N+X? What does that mean? N+X means amount required for operation plus X of whatever you need to ensure resiliency. It could be +1, +2, +3, but most commonly N+1 or N+2.
Do the backups (N+X) run during normal operation? The answer is yes and no. The level of resilience is referred to as active, passive, or standby. That means that it could be running, running in the background, or only running when a failure occurs.
Do the backups run at the same level as N? Good question. In the industry, we typically run at below capacity. The answer is that the backups must be able to sustain the failure without degradation in capacity or performance. It much match normal running performance but does not have to match the full capacity of N.
What is 2N Redundancy? How is 2N vs. N+1 Different?
2N means that you have two times the amount required for operation. You have two units of equal size, capabilities, and capacity. In the case of the flat tire, you would have a full sized spare for 2N. It would have the same road ratings for performance as your other tires. You could incur a flat tire and replace it with a full size spare thus driving at the same speed and distance as before the failure.
2N+1 means that you have two times the amount required for operation plus a backup. This means that you have a full size spare tire plus a temporary spare tire just in case. That means that you could incur two flat tires and still operate. However, you would not be able to operate at full speed or travel the distance due to the +1 spare.
Furthermore, 2N+2 means that you have two times the amount required for operation plus two backups. Is 2N+2 the highest level of redundancy? Maybe. Although, I’m sure that there are facilities and systems that may have additional redundancy safeguards. Perhaps 2N+3 or even a 3N variation through multiple systems. However, it’s not common practice in the industry. It may even have a diminishing return in terms of failure rates and cost.
What is 3N/2, 4N/3 Redundancy?
This is a new topic that we did not cover in the last article on redundancy. There is the concept of 3N/2, 4N/2 or even higher. How does this work?
In the case of 3N/2, you could have three different UPS systems. Each system could be backing up a separate system. Sound confusing? It is. For example, UPS A could be backing up Server Group B and Server Group C. UPS B could be backing up Server Group A and Server Group B. UPS C could be backing up Server Group A and Server Group C. This means that there are three UPSs always backing up at least two Server Groups. This type redundancy design can be immensely chaotic. It requires a lot of attention to detail and special configuration when balancing and managing load. Read this article for more information on 2(N+1) and 3N/2 Redundancy: High Reliability Options.
What’s the Difference? Redundancy vs. Reliability?
It’s difficult to say which configuration offers the best reliability versus cost. At the end of the day, redundancy is all about delivering the highest levels of reliability. You can have all of the backups in the world and still experience failure. For example, backup generators have a failure rate of 15% after eight hours of operation. In fact, a study by the Idaho National Engineering laboratory that found that 15 percent of emergency diesel generators failed after eight hours of continuous operation; one percent failed after 24 hours; five percent failed after half an hour; and 2 percent failed to start. For more information on failure rates, read the article Six Facts in High-Availability Data Center Design.
Lessons Learned: Ask the Right Questions
It’s key to have knowledge and facts as it relates to redundancy. However, redundant systems are only as good as the people testing and maintaining them. What’s the good of having redundant systems in place if they fail during failover? Make sure to ask the right questions upfront on redundancy and testing.
Download the Data Center Checklist for more information on comparing data center facilities and providers. Compare up to three data centers at once.