START TYPING AND PRESS ENTER TO SEARCH

AI Data Centers: Optimizing for the Future

Posted by Stephan Lam on October 29, 2024

Artificial intelligence has rapidly become the top buzzword in the IT industry. The AI frenzy began in earnest with the release of ChatGPT on Nov. 30, 2022. Within days, the chatbot had more than a million users. Major search engines soon added chatbot features to their platforms, and many applications and services began claiming they were AI-enabled.

Since then, the AI industry has steadily evolved, with new applications emerging daily. AI is revolutionizing many aspects of life and business. It is also profoundly impacting data center design, operations, management, and performance.

Organizations have been consolidating their data centers for years and moving many workloads to the cloud. Now, they’re looking to host AI workloads on-premises — workloads that have different requirements than traditional applications. Organizations must ensure their data center infrastructure can support AI and machine learning.

How AI Workloads Differ from Traditional Workloads

The primary difference between AI and traditional workloads is the power required. The workloads used to train AI must process vast amounts of data. Graphics processing units (GPUs) and other accelerators are used to handle the computational load. A single rack of equipment can consume 30 kW to 100 kW, depending on the type and number of GPUs. With global data center energy consumption already under the spotlight, this is one of the biggest challenges to AI evolution.

The inference workloads that put the trained model into production have more modest power requirements. However, organizations may require large numbers of racks for inference workloads depending on the volume of transactions.

Where there’s power consumption, there’s heat. Given the amount of power required, AI workloads generate more heat than traditional applications. Efficient cooling systems must be considered. Additionally, AI and machine learning require a high-speed, low-latency network to handle communication between the servers.

AI Design Considerations

Organizations must factor AI workload demands into the design of their data center infrastructure. Here are some of the key considerations.

Power Requirements

The power distribution infrastructure must be able to deliver enough power to support AI workloads. In some cases, North American data centers may need to upgrade their power distribution to 240/415V. Power distribution units (PDUs) may also require upgrades to meet the output power requirements of the IT equipment.

Thermal Management

Experts say air cooling is inadequate for power densities greater than 20 kW per rack. Organizations should implement liquid cooling technologies to reduce the risk of overheating. Placing AI workloads side by side within one area of the data center can simplify thermal management and help keep cooling costs in check.

Physical Infrastructure

AI servers tend to be deeper than standard servers, leaving less room in the back of the cabinet for PDUs, cable management, and liquid cooling manifolds. Standard cabinets may not have adequate load ratings to support the weight of the equipment. Organizations should assess their physical infrastructure and upgrade as needed.

Data Center Design Recommendations

PDU Upgrades

High-quality PDUs are a must for AI workloads, and it’s important to choose the right type. When selecting PDUs, organizations should ensure that the voltage, amperage, temperature rating, and the number and type of outlets are adequate to meet current and anticipated demand. Monitored PDUs allow data center personnel to monitor various metrics and receive alerts remotely.

In-Rack Cooling

Most data centers will likely use a combination of air and liquid cooling for AI workloads. In-rack cooling focuses chilled air on IT equipment for maximum efficiency. The cooling unit is mounted in the server cabinet, minimizing the amount of space to be cooled. In addition, in-rack cooling systems are much simpler to maintain than liquid cooling systems.

Stronger, Deeper Cabinets

Many data centers will need stronger, deeper server cabinets to accommodate AI equipment. Organizations should calculate the total weight of the equipment and ensure the cabinet’s dynamic load rating is adequate. Deeper cabinets provide ample room for AI servers plus PDUs, cable management, and other equipment.

Monitoring and Management

Clusters of AI workloads will likely coexist alongside traditional applications. Organizations will need tools such as data center infrastructure management (DCIM) to provide real-time insight into power loads and environmental conditions. These tools will be increasingly important as AI workloads become mission-critical. 

Agility and Flexibility

AI technology is evolving rapidly. Today’s design decisions won’t necessarily support solutions that become available in a few years. Organizations should build flexibility into the data center design and ensure the agility to make frequent moves, adds and changes as workload requirements change.

Contact the Experts

AI in the Data Center

AI isn’t just a burden on the data center. It offers the ability to transform data center operations and management with unprecedented efficiency and responsiveness. AI tools can automate many tasks, freeing up data center staff for critical projects. 

AI in Data Center Management

By rapidly analyzing massive data sets with deep learning systems, AI tools can provide insight into energy management and capacity planning and anticipate potential risks. These capabilities will prove valuable in data center management.

Capacity Management

AI can analyze the many factors involved in determining data center capacity requirements at any given time. As such, AI can automate this manual process and help data center operators make more informed decisions.

Energy Management

Similarly, deep learning tools can predict energy challenges based on usage and external factors. These tools can also help humans analyze alternative energy sources and ways to lower costs and meet sustainability objectives.

AI in Data Center Operations

AI brings the benefit of speed to data center operations. AI-enabled tools can synthesize and analyze information far faster than humans. They can then act on that information by alerting humans or performing specific tasks. This is particularly valuable in situations where time is of the essence.

Incident Response

Avoiding unplanned downtime is a primary responsibility of data center staff. When outages occur, operational teams need to identify and resolve the problems as quickly as possible. Traditionally, they have used “playbooks” that outline the process for troubleshooting, investigating, and responding to problems.

AI takes incident response to the next level. AI-enabled tools can quickly assess problems, determine the root cause, and develop a response plan. They can even perform these tasks in situations that were never anticipated and, therefore, not documented in the playbook.

Risk Prediction

If risks are understood, incidents can often be avoided. AI can continuously monitor a wide range of factors and predict potential problems. For example, AI-enabled tools can detect subtle changes indicating that a cooling unit is about to fail. 

The natural language processing capabilities of chatbots allow for the analysis of human emotions. In the data center context, chatbots could spot employee performance, behavior problems, or job dissatisfaction.

Physical Security

Data centers rely on onsite security personnel to monitor closed-circuit cameras and respond to unauthorized physical access. AI can take over video monitoring and more effectively identify individuals who may pose a risk to the facility.

Conclusion

As organizations bring AI workloads into the data center, they must ensure they have the infrastructure to support them. Enconnex offers an array of data center infrastructure products to meet the most demanding requirements. Let us help you assess your needs and select flexible and scalable solutions to take your data center into the future. Get in touch today.

Browse Our Catalog


Posted by Stephan Lam on October 29, 2024

Stephan has over 15 years of IT experience, including all aspects of data center operations, project management, service delivery, and sales engineering.

Learn more about Enconnex

Get to know Enconnex with a customized fit-out