Opportunities in Data Centers & Next-Gen Networking

Daniel Chesley

Oct 18, 2024

Opportunities in Data Centers & Next-Gen Networking

This post was originally published in More Intelligent, my monthly newsletter that shares thematic research and the technology trends I’m digging into. I’m here to give you an investor’s view on the market and what’s next in enterprise software. Subscribe here!

Last week, I hosted a Data Center Roundtable with Zac Smith, co-founder of Packet, a bare-metal server provider that was acquired in 2020 by Equinix, the world’s largest global data center provider.

Recently, Zac and I have been exploring areas where venture scale opportunities could emerge in next-gen networking and infrastructure. So, we brought together a group of founders, data center developers, energy brokers, server providers, and Fortune 500 customers to discuss ideas and trends shaping the industry.

Below are themes from the discussion and investment areas where we’re actively exploring at Work-Bench. If you’re building in these areas and want to chat, please feel free to reach out!

Context Setting: Data Centers are riding a wave of explosive growth thanks to AI, but not everyone is convinced there will be strong ROI.

Data centers have recently come in-vogue as AI research labs and hyperscalers look to invest in new-site construction and broader AI infrastructure to harness the power of LLMs and GPUs.

According to the WSJ, total spend for the global infrastructure market—including data centers, networking, and other hardware that supports the use of AI applications is expected to reach $423B by 2029, growing at a CAGR of ~44% over the next six years. On top of that, Citi forecasted that AI workloads will contribute to over 50% of data center IT load by 2030.

Ironically, despite this newfound demand, insiders to the data center world know that the industry is actually built around consistent, steady returns, which is why large private equity firms like Blackstone have spent the last 20+ years acquiring, investing in, and inking joint ventures across the data center landscape. As a result, the risks associated with scaled innovation have caused it to take a backseat to consistency.

Even with macro tailwinds propelling data centers into the limelight, many are skeptical that big-tech’s investment in AI infrastructure will pay off. Hyperscalers have spent an incremental $91B in capex along with legacy enterprises like Oracle are expecting to grow their data center footprint from 162 to 1000+ over the coming years. Still, there’s the ROI question. How sustainable is this over-allocation of compute and infrastructure?

Theme 1: Access to energy and reliable cooling remains constrained, but are integral components to the greater data center buildout.

This exploding demand for data center construction is causing a shortage of parts, property, equipment, and power - all integral components of data center makeup. The key hurdle to overcome as the AI supercycle ramps up is gaining access to efficient power and cooling solutions.

The AI boom of 2023 raised concerns over the amount of energy LLMs require when training on massive datasets and performing computations, because they operate on power hungry GPUs. By 2030, data centers are projected to consume a staggering 10% of the global power supply and consume more electricity than the entire nation of Brazil.

Given our nation’s energy constraints, the specific requirements for the new GPU-driven AI data centers are pushing builders to look for places where they can get lots of reliable electricity. Savvy operators are seeking alternatives to traditional natural gas, such as reliable renewables across hydrogen and nuclear energy.

Some examples:

Amazon recently bought a data center next to a nuclear power plant in Pennsylvania.
Meta is planning an $800M investment in computing infrastructure in Texas.
Standard Power, a digital-infrastructure company, is planning to supply power to data centers in Ohio and Pennsylvania with modular nuclear reactors similar to those used to power some submarines and aircraft carriers.
Beyond reliable alternative energy consumption, developers are scouring the globe and getting creative for new data centers like the ones being planned next to a volcano in El Salvador and inside shipping containers parked in West Texas and Africa.

In addition to harnessing energy, the lead time to get custom cooling systems is 5x longer than a few years ago and delivery times for backup generators have gone from as little as one month to as long as two years. This supply and demand imbalance has helped propel the stock of publicly traded companies like Vertiv, but could prove to be an opportunity for startups seeking to capitalize on the widespread shift.

Theme 2: The industry requires collaboration and customization to flourish.

Current data center construction is largely built to modern-day standards with general-purpose technology that can take multiple years to construct. Moving forward, we need to build interoperable data centers with modern software and hardware, so we can service new use cases, unique to each enterprise, and quickly update data centers with the best-in-class technology.

These two examples point to a future where enterprises will require bespoke data center systems to power their own unique use cases:

Last year when Meta's in-house chip failed to meet internal targets and the planned rollout was scrapped, Alan Duong, Meta's Global Director of Data Center Strategic Engineering summarized the conundrum well: “We're experiencing growth in AI tech, so we need to make sure that our data center can adapt to something that's still evolving. AI makes innovating in data center design really complex. For example, over the next four years, we may see a one and a half to two times growth in power consumption per accelerator, and roughly one and a half time for high bandwidth memory. And so if we're just starting on construction for that data center, by the time we're done, we might be obsolete [unless we futureproof].”
Earlier this summer Pixar employed a technique called volumetric animation that produces highly detailed characters and movements for their movie“Inside Out 2.” This doubled the data capacity needs of 2020’s “Soul” and required about 75% more computation. This type of character rendering is at odds with traditional data architectures and the processing for this diverse data requires a reconsideration of Pixar’s on-prem infrastructure.

To make this a reality, we’ll need real collaboration across utility companies, real estate developers, and the OEMs that help procure parts and construct the data centers. Historically, these entities have acted as bottlenecks to each other, but need the other to get the job done. We could see a future where each of these players come together under one roof to create a new generation of infrastructure, or maybe even a world where hyperscalers and research labs bypass them altogether.

Lastly, data centers are an unregulated and siloed industry—there’s no public forum or central community to talk about the industry’s problems or spur innovation. In software, we have organizations like the Linux Foundation and the Apache Open Source Foundation that help developers manage and scale open technology projects and ecosystems.

If we are to move forward with new technology and thinking about how we can power and build data centers, more collaboration is certainly top of mind - who will be the OEMs that emerge ready to build next generation infrastructure?

Theme 3: New clouds will emerge to service modern workloads.

Managing a data center is no easy task and obviously requires a strong understanding of hardware, infrastructure, networking, and software.

However, modern software development suggests a shift — future DevOps teams likely won’t look to manage their own infrastructure through the likes of AWS, GCP, or Azure.

Hyperscale clouds require tremendous infrastructure knowledge that are getting abstracted away as developers seek to work with vertically integrated cloud vendors to solve issues like latency, inflexibility in scaling resources, and a lack of vertical-specific tools, all of which can impact developer velocity.

In fact, different workloads will require a different vertically integrated stack to serve it. Today, engineers are getting more and more accustomed to serverless offerings from the likes of Cloudflare Workers and more, all which help abstract away the underlying knowledge of these platforms. As a result, there will be a new class of infrastructure providers that can capture the opportunity.

Right now, it feels like Netlify, Vercel, and Fly have the best chance of becoming the next true cloud platforms. These companies allow developers to focus on shipping code without having to think about the underlying infrastructure and provide greater efficiencies across a variety of customers. By bringing cloud-experiences to where developers work is unearthing tremendous time and cost savings, something that traditional cloud platforms overcomplicate and make challenging for developers.

Similarly, a new breed of AI cloud platforms has emerged, offering optimized compute resources, enhanced scalability, and developer-centric environments. Companies like Runpod and Together are making strides in building the AI cloud, where hardware, network, and software come together seamlessly to support next-gen applications. Their globally distributed GPU cloud platforms help engineers easily deploy AI when and where they want it. These specialized platforms are designed to handle the unique demands of AI workloads, from the high computational requirements of model training to the need for rapid scaling and efficient resource allocation as opposed to traditional tools like AWS Sagemaker, which is costly and inflexible.

Finally, to vertically integrate and increase the margin profile for the new cloud platforms, new points of presence (i.e., PoPs) and data center infrastructure will need to be built. One final question is how and where should these PoPs be built? Right now, Virginia is home to almost half of all domestic data centers and 35% of the worldwide population. As we look to the future, we’re seeing a brave new world for connectivity — companies like Axiom Space are putting data centers in outer-space to provide in-space cloud services without the need to connect back to terrestrial cloud infrastructure while Armada is bringing connectivity to remote environments.

Key Areas On Our Radar

We’re exploring several areas in the data center and next-gen networking space where there could be venture-scale outcomes across the stack.

We believe the best founder profiles for next-gen networking will be operators who truly understand the ins-and-outs of the software, networking, and hardware, trilogy. If that sounds like you and you’re exploring these areas, reach out – I’d love to meet! I’d love to meet you!

Repatriating Workloads: As companies move certain workloads back on-prem, new skills and technologies will be needed to support next-gen data center operations and infrastructure more broadly.
Serverless Runtime: With so much already abstracted away in platforms like AWS, what’s the next step in serverless computing? How will tomorrow’s developers want to interact with the cloud?
Kubernetes Evolution: While Kubernetes has become the de facto standard for container orchestration, it was designed for traditional applications, not the unique demands of AI – how will developers orchestrate AI workloads?
Next-gen VMware: VMware normalized DevOps, but workloads are advancing far faster than VMWare can manage. Who will emerge as tomorrow’s VMware equivalent?
Edge Computing: As companies seek better performance, storing data at the edge will become increasingly important. Companies like Vast Data are already solving this challenge through their ‘data in motion’ offerings, but where else could there be opportunities within edge? We’re seeing interesting companies attempting to make data more accessible through unique edge points-of-presence?
Compute-Optimized Software: Software has the potential to help vertically integrate outcomes across use cases and mitigate the energy and data efficiency problems. We’re excited about a broad swath of opportunities ranging from Power Orchestration-as-a-Service, deploying our code to be more energy and cost efficient, data center operations software, leveraging compute in the lowest-cost, most hyper-efficient way, and more.
Advances in Data Center Infrastructure Management: Software helps monitor and manage data center operations and improve efficiency by providing real-time insights into power usage, temperature, and equipment performance. Who will innovate on what Schneider Electric and IBM have already accomplished here? Companies like Phaidra are a great first step.
Future of Remote Monitoring Systems: Remote monitoring systems allow data center operators to oversee operations from a distance, ensuring continuous performance. Remote monitoring tools from Paessler and SolarWinds enable data center operators to monitor environmental conditions, power usage, and network performance, anomaly detection and downtime mitigation.
Chips Optimized for AI & Compute Heavy Workloads: We’re seeing more and more interest in verticalized chip manufacturing and ASICs. Chip providers want to produce a service vs generalized service. Companies like Groq and Etched are leading the charge in producing chips optimized for specific AI use cases, heralding a new era of bespoke hardware. How does this change as GPUs continue to advance?

It’s clear, the world of data centers and cloud computing is rapidly evolving. As startups and established players alike navigate these shifts, the future of data centers holds tons of potential. The supply and demand imbalance for parts, property, and equipment, making it a particularly exciting time to be an OEM, cloud provider, or picks and shovels startup seeking to play a game of David and Goliath.

TOPICS

Research

Meet NYC’s Future SaaS Founders @ cofounders.nyc