Category
Category
Stay Connected
Receive the latest articles and
content straight to your mailbox
Subscribe Here!
Causes of Data Center Outages, Costs, and How To Prevent Downtime
| Categories: IT Infrastructure, Data Center
The thought of unplanned downtime strikes fear in the hearts of every data center operator. According to the Uptime Institute’s Annual Outages Analysis 2023, 70 percent of data center outage incidents cost $100,000 or more, with 25 percent costing more than $1 million. The proportion of outages costing more than $100,000 has increased steadily, largely due to the growing business dependence on data center availability, and data center industry trends show nothing but growth for the foreseeable future.
Clearly, minimizing the risk of downtime is a high priority for data center operators. By understanding the main causes of data center outages and implementing highly manageable data center infrastructure, operators can minimize downtime and the associated costs.
3 Main Causes of Data Center Outages
Data center outages typically have multiple causes, and identifying the primary factors often depends on perspective. However, the Uptime Institute has been tracking major outages since 2016, and the data are remarkably consistent. Three main causes of data center outages stand out:
- Human error. Human beings are generally the weakest link in data center operations. The Uptime Institute estimates that human error is a factor in up to 80 percent of all outages. IDC estimates that human error costs organizations more than $62.4 million every year, mostly attributed to mistakes in performing tedious and manual tasks.
- UPS and power failure. Almost half (44 percent) of data center outages are caused by onsite power system failure, with 40 percent of those caused by UPS failure. UPSs are indispensable to data center operations, but they’re often forgotten once installed. Battery failure is the chief cause of UPS problems, and rising data center heat loads can reduce battery life substantially.
- Cooling system failure. Just 13 percent of outages are attributable to cooling system failure, and the number has stayed roughly the same over the past three years. Nevertheless, increasing data center heat loads have made cooling system failure a more significant — and potentially costly — threat.
Cost of Data Center Outages
Data center outages can be devastating to a business’s bottom line. As mentioned above, Uptime Institute found that a majority of outages cost businesses more than $100,000, with many above $1 million. The bigger the organization and operation, the higher the cost. For example, in 2021, it’s estimated that Amazon lost $34 million in revenue, Facebook lost $100 million in revenue, and Alibaba lost $1 billion in revenue due to data center outages. These were not multi-day events, either.
Breaking the costs of data center outages into their components can be difficult. Given that human error plays a role in most outages, preventing accidents and mistakes can help reduce these costs. However, finding qualified staff is the No. 1 operational and management need for 54 percent of data center operators, according to the Uptime Institute’s 2022 Management and Operations Survey.
Technical debt is another major challenge. Data centers accrue technical debt when they rely on outdated technology. The debt must ultimately be “repaid,” often with compounded “interest” in the form of worsening problems. Data center infrastructure accrues technical debt just like IT equipment.
Preventing Data Center Outages
The right policies and procedures help reduce data center outage risk. Uptime Institute’s 2022 Global Data Center Survey found that 78% of data center managers believe downtime is preventable with process improvements, management, and configurations. Procedures should focus on the most critical workloads and the most likely causes of an outage and be reviewed and updated regularly. Emergency procedures should also be developed, tested, and practiced. If staff can respond quickly and appropriately to an incident, they can often prevent it from becoming a full-scale outage.
Data center infrastructure management (DCIM) tools can help IT teams detect issues that could lead to downtime by monitoring the health of various systems and presenting the data in easy-to-read dashboards. Automation and artificial intelligence can help ease operational challenges by eliminating many repetitive tasks, but there’s no substitute for training and accountability.
Organizations should also regularly refresh data center infrastructure components to reduce risk and increase efficiency. The right racks and cabinets, cooling systems, and power systems can reduce operational overhead and better protect IT equipment.
How Enconnex Can Help
Enconnex offers an array of data center infrastructure products to help data centers operate more reliably and efficiently. Our new InfiniRack data center cabinet features excellent load ratings and a structural design that maximizes usable internal space and airflow. Easy to customize, move and maintain, the InfiniRack can adapt to the need of nearly any data center design and handle ever-growing power densities and cabling requirements.
Data center outages take a financial toll due to business disruption, lost revenue, and reduced productivity. The trickle-down effects of brand damage and missed opportunities can haunt organizations for years. Contact Enconnex for help in optimizing your data center infrastructure to maximize availability.
Posted by Thane Moore on August 2, 2023
Thane Moore is the Senior Director of Sales Operations & Logistics for Enconnex and has 20 years of experience in the IT infrastructure manufacturing space working for companies such as Emerson and Vertiv.
Tags: IT Infrastructure, Data Center