So…You Want to Build a Data Center

Thomas Shipp | Head of Equity Research

Last Updated:

Additional content provided by Tucker Beale, Sr. Analyst, Research.

As the capital expenditure (capex) race for compute continues, we thought that it would be worthwhile to briefly outline the current state of play facing the well-publicized data center buildout. To understand why so much capex is needed to support artificial intelligence (AI), we must first understand how data centers are built and operated.

Let’s start with the basics. What are data centers and why has the AI race sparked such demand for them? A data center is essentially a warehouse-sized computer. Just like your home computer, data centers need chips to carry out the computations that power our lives. These chips require sophisticated software, data storage, fast connections, reliable power, and no small amount of cooling to operate properly. The difference between our devices and these data centers is the scale and, in the case of the data centers powering AI, chip specialization.

The Almighty Chips (Semiconductors)

Most of the chips locally powering your devices are called central processing units, or CPUs. CPUs handle a wide range of computations and can be thought of as generalists powering most of the work that our computers do. While CPUs are great for carrying out a wide range of computations, occasionally the need arises for your computer to do a lot of very simple and very similar computations as quickly as possible. The traditional use case for these types of computations was graphics visualization for programs such as video games. The equation to change a pixel on a screen from one color to another may be very simple, but there are a lot of pixels that need to be updated and at a high frequency. This is where graphics processing units or GPUs come in.

Think of CPUs as painting a picture with a paintbrush, whereas GPUs create the same image using 1,000 paintball guns all firing at the same time. As luck would have it, parallel processing of simple computations at lightning speed was the unlock needed to power AI. Tensor processing units, or TPUs, are custom application-specific integrated circuits (ASICs) that are more efficient for some AI applications than GPUs but need to be custom-tailored to a more limited set of use cases. Efficiency gains are incredibly important when delivering compute at scale, but GPUs remain the primary driver of AI.

This is in part due to the proliferation of Nvidia’s CUDA parallel computing platform and software used in the development of AI models. The tight integration of the software development layer (CUDA) and hardware (GPUs) has created an AI platform with obvious switching costs as well as network economies: as the community of AI developers all “speaking the same language” grows larger, more data center customers prefer the platform with the largest, most productive developer ecosystem. Tradeoffs between efficiency and flexibility have led data center operators to consider a mix of general-purpose GPUs and task-specific TPUs in the compute stack. When building data centers, deciding which chips to fill them with matters as continuous semiconductor innovation can drive obsolescence over shorter-than-expected timelines.

Power

All this computational might comes with very high-power requirements, and outages can cause expensive downtime. Data center power consumption was traditionally measured in megawatts (MW). One megawatt-hour (MWh) is enough to power the average American home for over a month. The largest projects are now being measured in gigawatts (GW) or 1,000 megawatts. A 1GW data center could consume as much energy as roughly 840,000 American homes.

As AI demand increases, so do the power consumption requirements. Meaning that with continued progression, barring significant improvements in efficiency gains, power requirements are likely to continue increasing over time. This has put AI initiatives at odds with environmental initiatives and has caught the attention of regulators.

Cooling

In the process of converting power into AI insights, the chips involved create a lot of heat. If the chips are not properly cooled, this can lead to critical failures and the loss of expensive equipment in very little time. Traditionally, air cooling was sufficient, but liquid cooling appears to be required to efficiently utilize the latest generation and future generation of chips. This requires a new set of expertise and expense for new data centers or expensive retrofits to existing data centers looking to make use of the best chips available. The costs associated with retrofitting a traditional data center could make doing so not economically viable, increasing the risk of data center operators owning unattractive assets that cannot serve the high-end AI compute demands in the market.

Conclusion

Strong growth in demand for compute in the coming years is a reasonable base case. That said, the operational complexity, infrastructure reliance, risk of obsolescence, and resource intensive nature of data center buildouts mean that there will likely be snags along the way. We would expect capacity constraints to remain a consistent issue without either a step-change in chip efficiency or a massive buildout of new energy capacity (likely nuclear, which we expect will take a minimum of seven years to materially bring online). These issues are compounded by increased regulatory pressure in response to rising data center energy and water usage. All of this to say, even if we hold AI demand growth constant, investors should be wary of the risk that meeting that demand could take longer and/or cost more than expected.

Thomas Shipp profile photo

Thomas Shipp

Thomas Shipp leads the Equity Research team at LPL Financial, which provides insights driven from quantitative and fundamental equity research.