The Future of Compute: How AI Agents Are Reshaping Infrastructure (Part 1)

Priyanka Somrah & Diego Oppenheimer

Apr 15, 2025

The Future of Compute: How AI Agents Are Reshaping Infrastructure (Part 1)

As AI agents evolve from experimental prototypes to mission-critical systems we believe there will be a critical mismatch between their unique demands and our current computing infrastructure. This series examines how AI's computational patterns are likely going to force a fundamental rethinking of resource architecture and management.

Why This Matters

Organizations deploying AI agents will face three critical challenges:

Economic Inefficiency: Current resource allocation models poorly accommodate AI's variable memory needs, specialized compute requirements, and bursty utilization patterns.
Technical Limitations: Existing architectures constrain agent capabilities in persistent state management, real-time collaboration, and autonomous decision-making.
Operational Challenges: As AI penetrates critical systems, we lack adequate tools for monitoring, debugging, and ensuring reliable agent performance.

This series examines whether traditional compute paradigms can adapt to AI agents' needs or if entirely new approaches are required. Getting this right will determine who can deploy AI effectively at scale - and who gets left behind.

Summary

The rapid advancement of AI agents—autonomous software entities capable of perception, reasoning, and action—is creating significant new demands on computing infrastructure. We explore whether traditional compute paradigms, from mainframes to serverless architectures, can effectively adapt to meet the unique requirements of AI agents, or if a more fundamental paradigm shift in how we design, deploy, and manage computational resources might be necessary. By examining the compatibility between existing computing models and emergent AI agent needs, we can better speculate whether evolution of current frameworks or revolution through entirely new approaches will best serve this technological frontier.

This compatibility challenge matters profoundly for three key reasons. First, organizations deploying AI agents under current paradigms face significant economic inefficiencies due to the gap between agent needs and resource allocation models. Current systems struggle to efficiently handle the unique computational patterns of AI agents, which include variable memory requirements (from minimal during idle periods to substantial during reasoning tasks), unpredictable cold start times, specialized compute requirements (like GPU access for inference), frequent external API calls to knowledge sources, and bursty utilization profiles.

Second, the technical limitations of current architectures create artificial constraints on agent capabilities, particularly in areas requiring persistent state, real-time collaboration, and autonomous decision-making. Third, as AI agents become increasingly embedded in critical infrastructure, healthcare, finance, and daily life, the compute paradigm we select must prioritize operational excellence through reliability, observability, data security and debuggability. The challenge is how computational architectures can provide the concrete tooling necessary to monitor agent behavior, inspect decision processes, and efficiently troubleshoot issues in production environments. Without these capabilities built into the computational foundation, organizations will face significant hurdles deploying AI systems in contexts where consistent performance and transparent operation are non-negotiable requirements.

By examining how computing approaches could evolve to better support AI agents, this discussion offers informed projections about potential technological paths, economic models, and architectural patterns. Rather than presenting comprehensive research findings, we're exploring plausible developments that might emerge as the field progresses - considering conditional scenarios that could shape infrastructure development and potentially unlock more advanced agent capabilities. The bottom line: to be able to truly deploy AI at scale we will have to figure out the right computational approach, - a capability that should deliver substantial business benefits as these systems become increasingly central to operations.

The Evolution of Compute Paradigms

The history of computing reflects the escalating complexity of software workloads, evolving through distinct architectural paradigms. Initially, mainframe computing dominated from the 1950s through the 1970s, with centralized systems handling multiple tasks through time-sharing mechanisms. The shift to client-server architecture in the 1980s and 1990s distributed processing while introducing resource allocation challenges. This led to virtual machines (VMs), which permitted multiple operating systems to run independently on a single physical server, significantly advancing infrastructure flexibility. However, VMs consumed substantial resources, often leading to hardware inefficiency and under-utilization rates frequently below 50%.

The industry subsequently moved toward containerized environments, which represented a fundamental shift in virtualization approach. While VMs virtualized the entire hardware stack requiring complete operating systems for each instance, containers virtualized only at the operating system level, sharing the host OS kernel while maintaining application isolation through lightweight process boundaries. This significantly reduced resource overhead and startup times. Docker first emerged as the key technology enabling this containerization revolution, with Kubernetes later developing as a solution for container orchestration, coordination, and scale-out across distributed infrastructure.

Serverless computing emerged as a post-containerization development, building upon container technology foundations. In serverless architectures, computational resources are automatically provisioned, scaled, and managed by the cloud provider in response to event triggers, with users charged only for actual execution time rather than pre-allocated capacity. Key characteristics include event-driven execution, statelessness, granular billing, and automatic (and in best cases predictive) scaling. While serverless likely represents a direction for future compute paradigms due to its operational efficiency and management simplicity, it introduces significant challenges for complex applications: cold start latency affects performance, execution time limits constrain workloads, statelessness complicates data persistence, limited runtime environments restrict technology choices, debugging becomes more difficult due to distributed nature, and vendor-specific implementations can lead to lock-in.

Understanding AI Agents

AI agents are autonomous software entities that perceive their environment, make decisions, and take actions to achieve specific goals. Unlike traditional applications, agents typically combine multiple capabilities: they process inputs through perception modules (text, images, structured data), reason about information using large language or domain-specific models, maintain working memory about contexts and goals, execute actions through API calls or direct system operations, and often collaborate with humans or other agents.

Clarifying Our Use of “AI Agent”

We recognize that the term agent can mean different things, ranging from simple rule-based software to complex, multi-robot systems. We specifically use AI agent to refer to autonomous software entities that can perceive their environment, maintain working memory about goals and context, reason using advanced techniques (often large language or domain-specific models), and take actions (for example, by calling APIs, updating databases, or interacting with users). These agents typically require some form of persistent state and may collaborate with humans or other agents in real time. Our focus is on these increasingly intelligent, autonomous systems that combine perception, reasoning, and action rather than on simpler, stateless AI integrations.

Computational Requirements of AI Agents

Visualizing the Differences: CRUD Systems vs. AI Agents

In traditional request-response architectures (often called CRUD—Create, Read, Update, Delete), most operations can be mapped to a fairly predictable pattern of database reads/writes tied to user requests. In contrast, AI agent workloads exhibit additional layers:

Context Fetch & Preparation
The agent retrieves context from various data sources—vector stores, cloud-based knowledge APIs, or local memory—and may need large memory capacity to keep track of conversation history or reasoning steps.
Inference & Reasoning
Instead of performing a simple business-logic check, the agent calls on specialized AI models (potentially running on GPUs or TPUs) to understand, generate, or transform information.
Action & Response
The agent can then take actions based on its reasoning, which can mean updating databases, calling external services, or coordinating other agents. These actions often feed back into the agent’s memory or spawn new inference cycles.

Below are two simplified diagrams illustrating these differences.

1. CRUD Systems Flow Diagram

*“A user request triggers a database read or write, after which the application returns a response. Memory usage and CPU load remain relatively stable”*

‍
2. AI Agent Flow Diagram

“This shows how an agent fetches context, performs inference on specialized hardware, may maintain state across interactions, and triggers subsequent actions or API calls in a looped or multi-step manner”

AI agents present distinct computational requirements that set them apart from traditional workloads:

Variable Compute Intensity: Agents often cycle between computationally intensive reasoning phases and relatively light monitoring or waiting states.
Memory and State Management: Many agents require persistent context maintenance across interactions, necessitating efficient state storage and retrieval mechanisms.
Specialized Hardware Access: For reasoning and inference tasks, agents frequently need access to specialized hardware like GPUs or TPUs, but not continuously.
Unpredictable Scaling Patterns: Agent activity often follows human interaction patterns or event-driven triggers, creating bursty usage profiles.
Diverse Latency Requirements: Some agent tasks (like real-time conversation) have strict latency constraints, while others (like background analysis) can tolerate delays.
External Service Dependencies: Agents typically interact with multiple external APIs and knowledge sources, introducing network-bound operations and dependency management complexity.

Computational Economics for AI Workloads

The economics of compute resources fundamentally changes with AI agent workloads. Traditional infrastructure cost models based on static resource allocation become inefficient when dealing with the variable, often bursty nature of agent operations. A computational economics model that considers the total cost of ownership (TCO) across different paradigms reveals interesting patterns:

Compute Paradigm	Upfront Costs	Operational Costs	Scalability Cost	Typical Workload Fit	Management Tools	Key Challenges
On-Premises Dedicated	Very High (hardware procurement, data center costs, cooling infrastructure)	High (power, cooling, maintenance personnel, facility costs)	Step Function with Significant Delay (procurement cycles, installation)	Continuous, state-heavy agents with predictable long-term demand	VMware, Xen, bare-metal management solutions	Highest overprovisioning for peak loads, significant capital lock-in, hardware depreciation, high personnel costs, lengthy procurement cycles for AI hardware
Cloud Dedicated Resources	Medium (reserved instance commitments, potential minimum terms)	Medium-High (premium for dedicated resources, predictable monthly costs)	Step Function with Minimal Delay (resource allocation time)	Continuous, state-heavy agents with fluctuating but predictable demand	Cloud management consoles, infrastructure as code tools	Overprovisioning for peak loads, idle capacity costs, complex reservation planning for unpredictable AI workloads
Container Orchestration	Medium (cluster setup, control plane)	Medium (cluster management, monitoring complexity)	Linear (container scaling)	Multi-agent systems with shared resources	Kubernetes, Docker Swarm, Nomad, Prometheus, Grafana	Orchestration overhead, complex configuration for AI-specific resources (GPU scheduling, memory limits), difficulty sizing containers for variable AI computational patterns
Serverless	Low (minimal setup)	Low-Medium (costs can spike with unexpected invocations)	Continuous (granular scaling)	Intermittent, stateless operations	AWS Lambda, Azure Functions, Google Cloud Functions, OpenWhisk	Cold start latency impacts AI inference quality, execution time limits constrain complex reasoning, statelessness complicates persistent context maintenance, limited debugging visibility
Hybrid Models	Medium (integration complexity)	Medium (cross-paradigm management overhead)	Optimized (right-sizing to workload)	AI agents with variable state requirements	Custom orchestration layers, cloud management platforms, Terraform	Significant complexity in workload placement decisions, lack of specialized tools for AI-specific resource optimization across paradigms

As AI agents become mainstream computational workloads, we'll need new tools or significant adaptations to existing ones because: (1) current monitoring systems lack visibility into AI-specific metrics like reasoning quality and inference latency variations; (2) existing autoscaling mechanisms don't understand the relationship between model complexity, memory requirements, and computational phases of agent operations; and (3) resource allocation strategies are not optimized for the bursty, state-dependent nature of agent workloads. Tools that bridge these gaps will likely determine which organizations can efficiently operate AI systems at scale.

In Part 2 of this series, we'll explore emerging solutions to these challenges, including memory-compute disaggregation, real-time collaborative agent architectures, hybrid cloud-edge deployments, and a unifying framework for agent-centric computing.

As we've explored in this first part, AI agents represent a fundamental shift in computing workloads that strains our current infrastructure paradigms. The gap between what AI agents need and what traditional compute models provide creates significant economic inefficiencies and technical constraints that limit the potential of these systems. Organizations looking to deploy AI at scale face a critical decision point: continuing to adapt existing frameworks with diminishing returns or embracing new computing paradigms designed specifically for AI's unique requirements. We suggest that evolution alone may not suffice and that a more revolutionary approach could be necessary to fully unlock AI's potential.

While we focus here on key computational requirements like variable compute intensity, memory and state management, specialized hardware access, etc. we recognize that AI agents are still an evolving category. In the same way that a 1975 prediction about personal computing or a 1995 prediction about cloud computing would have inevitably missed critical details, it’s challenging to know which parameters will ultimately drive the most significant change in AI agent architectures.

Each organization, domain, and agent design might prioritize different capabilities; some may hinge on massive context windows and real-time data ingestion, while others may leverage a swarm of simpler specialized agents. Rather than making definitive forecasts, our goal is to highlight the most pressing questions that have already emerged in early deployments and point toward the architectural shifts required to accommodate them.

Credits:

‍This piece benefited greatly from the reviews, corrections, and suggestions of James Cham, Guido Appenzeller, Nick Crance, Tanmay Chopra, Demetrios Brinkmann, Kenny Daniel, Davis Treybig, as well as the tireless AI collaborators Gemini, Claude, and ChatGPT, who provided endless drafts, rewrites, and the occasional existential question about the future of sentience- we promise to remember you when the robots take over.

About the authors:

Diego Oppenheimer is a serial entrepreneur, product developer, and investor with a deep passion for data and AI. Throughout his career, he has focused on building and scaling impactful products, from leading teams at Microsoft on key data analysis tools like Excel and PowerBI, to founding Algorithmia, which defined the machine learning operations space (acquired by DataRobot). Currently, he provides strategic advisory for startups and scale-ups in AI/ML. As an active angel investor and advisor to numerous companies, he is dedicated to helping the next generation of innovators bring their visions to life.

Priyanka Somrah is a principal at Work-Bench, a seed-focused enterprise VC fund based in New York. She focuses on investments across data, machine learning, and cloud-native infrastructure. Priyanka is the author of The Data Source, a newsletter for technical founders that highlights emerging trends across developer tools. She's also the author of Your Technical GTM Blueprint, a series that breaks down how technical startups navigate go-to-market—from first hires to scaling repeatable sales.

References

Compute Express Link Consortium. (2023). Boosting AI Performance with CXL.
https://computeexpresslink.org/blog/boosting-ai-performance-with-cxl-3818
Compute Express Link Consortium. (2023). CXL 101: An Introduction to Compute Express Link.
https://computeexpresslink.org/wp-content/uploads/2024/11/CR-CXL-101_FINAL.pdf
SemiEngineering. (2023). CXL Thriving As Memory Link.
‍https://semiengineering.com/cxl-thriving-as-memory-link
Reuters. (2025, April 9). Google launches new Ironwood chip to speed AI applications.
‍https://www.reuters.com/technology/google-launches-new-ironwood-chip-speed-ai-applications-2025-04-09‍
Reuters. (2025, March 11). Meta begins testing its first in-house AI training chip.
‍https://www.reuters.com/technology/artificial-intelligence/meta-begins-testing-its-first-in-house-ai-training-chip-2025-03-11
The Next Platform. (2023, May 18). Meta crafts homegrown AI inference chip and datacenter architecture.
https://www.nextplatform.com/2023/05/18/meta-platforms-crafts-homegrown-ai-inference-chip-ai-training-next
Byte-Sized AI. (2024). Demystifying NVIDIA Dynamo: A High-Performance Inference Framework for Scalable GenAI.
‍https://medium.com/byte-sized-ai/demystifying-nvidia-dynamo-a-high-performance-inference-framework-for-scalable-genai-f10be3d7032f
Meta Engineering. (2024, March 12). Building Meta’s GenAI Infrastructure.
‍https://engineering.fb.com/2024/03/12/data-center-engineering/building-metas-genai-infrastructure
Zhang, H., Kim, Y., Qian, Y., et al. (2023). Understanding and Optimizing Serverless Workloads in CXL-Enabled Tiered Memory. arXiv.
‍https://arxiv.org/abs/2309.01736

TOPICS

Research

Meet NYC’s Future SaaS Founders @ cofounders.nyc