Work-Bench Snapshot: Augmenting Streaming and Batch Processing Workflows
The Work-Bench Snapshot Series explores the top people, blogs, videos, and more, shaping the enterprise on a particular topic we’re looking at from an investment standpoint.
This is the second part of our series examining how AI's computational patterns are forcing a fundamental rethinking of resource architecture and management. In Part 1, we explored the evolution of compute paradigms, the unique requirements of AI agents, and the economic challenges of current approaches.
The increasing sophistication of AI agents reveals critical limitations in current compute paradigms. The sophistication is growing dramatically as agents are evolving from simple rule-based systems to complex entities leveraging large language models, multi-modal capabilities, and expanding toolsets for interacting with the world. Traditional infrastructure approaches exist on a spectrum: at one extreme, dedicated machine rental offers complete control but suffers from significant cost inefficiencies and poor utilization, as resources sit idle during processing lulls. At the opposite end, serverless computing optimizes resource allocation through ephemeral, on-demand execution but struggles with maintaining persistent state, faces problematic cold start latency issues that can severely impact agent responsiveness and probably most importantly is the most difficult to debug (a fact that is amplified by the complexity of debugging probabilistic systems in general)
This tension extends beyond the classic stateful versus stateless architectural debate in distributed systems. AI agents introduce additional complexity by typically requiring heterogeneous compute resources,CPU for orchestration and GPU for inference;often with different scaling patterns and utilization curves. While some organizations abstract GPU compute via API-based model providers, this introduces different architectural considerations rather than simply adding complexity. These considerations include trade-offs between latency requirements, data security constraints, model customization needs, and resilience planning for service dependencies, debugging and management
Managing these diverse compute requirements presents fundamental challenges across infrastructure types. Dedicated approaches struggle with capacity planning for unpredictable reasoning patterns, containerized environments face complex GPU and memory configurations, and serverless models contend with cognitive disruptions from cold starts and execution limits. These challenges are compounded by AI agents' hybrid needs—requiring continuous availability and state retention alongside real-time responsiveness and multi-agent interactions. Even event-driven paradigms face reliability challenges when coordinating complex triggering conditions across distributed agent systems.
The complexity intensifies with agent swarms, where parent agents delegate to specialized child agents in dynamic fan-out/fan-in patterns that stress traditional scaling mechanisms. While containerization and serverless excel at distributed workload scaling, they struggle with persistent memory states and latency-sensitive interactions, forcing developers to implement external state management that adds overhead and diminishes their inherent advantages.
A promising architectural evolution to address the challenges of AI agent infrastructure involves the disaggregation of memory and compute resources specifically for inference workloads. Traditional architectures tightly couple memory with processing units, creating significant inefficiencies for agent operations where memory demands fluctuate dramatically based on context length, reasoning complexity, and simultaneous interaction volume.
Emerging technologies like Compute Express Link (CXL) enable a more flexible relationship between compute and memory resources, providing key advantages for agent inference:
Unlike AI training environments where compute-to-memory ratios remain relatively constant, agent inference workloads exhibit highly variable memory patterns. Organizations like Meta and Google are already implementing disaggregated memory architectures in their production inference environments. Based on initial explorations in the CXL space there is an expected 40% improvement in resource utilization for AI agent workloads.
A crucial nuance in memory-compute disaggregation is the difference between the memory allocated to store and serve model weights (often in GPU VRAM) and the memory required for each agent’s unique state. Model weights can often be shared across multiple agents when they are all invoking the same underlying model, creating opportunities for more efficient resource pooling. In contrast, each agent’s context and state (e.g., conversation histories, intermediate reasoning, specialized knowledge) is unique and must be allocated separately.
This distinction leads to new design patterns for memory management:
By treating model weights and agent states as separate categories of “memory demand,” a disaggregated infrastructure can handle each more intelligently: shared memory for weights to maximize utilization, and agent-specific allocations that spin up or down with agent workflows. In practice, achieving this balance demands real-time orchestration leveraging technologies like CXL to avoid bottlenecks for either model inference or per-agent state updates.
This approach represents a fundamental shift from the traditional server-centric compute model toward a more fluid, resource-pool oriented architecture that accommodates the dynamic nature of agent operations. By addressing the state management challenges that plague current serverless implementations, memory-compute disaggregation could become a cornerstone of efficient agent deployment at scale.
The emergence of multi-agent systems and agent swarms introduces unique computational requirements beyond those of single-agent deployments. These collaborative systems demand:
Current compute architectures do not have a clear answer for these requirements, particularly when agents operate across different execution environments. Emerging solutions include specialized agent communication fabrics that maintain consistent performance regardless of agent location (cloud, edge, or hybrid deployments). These solutions directly address the fan-out/fan-in patterns that stress traditional scaling mechanisms. Experimental implementations demonstrate latency reductions of up to 70% compared to traditional API-based agent communication methods, potentially transforming how agent swarms operate at scale.
Based on the analysis of current paradigm limitations, a hybrid cloud-edge computing model shows particular promise for addressing agent compute requirements. This approach balances:
This hybrid approach directly addresses many of the challenges identified in dedicated, containerized, and serverless paradigms by leveraging each model's strengths while mitigating its weaknesses. Rather than forcing agents into a single compute paradigm, this approach allows computation to flow to the most appropriate location based on the specific requirements of each agent task.
Organizations implementing hybrid architectures for AI agents are beginning to see improvements in both performance and cost efficiency. Interactive agent tasks benefit from reduced latency when edge computing handles time-sensitive operations, while cost savings can be realized by optimizing workload placement across the compute spectrum.
However, exploring hybrid cloud-edge computing for AI agents faces several substantial challenges:
Despite these challenges, the potential benefits of intelligently distributing agent workloads across cloud and edge resources may make hybrid approaches increasingly attractive as agent capabilities and deployment scenarios grow more sophisticated.
A potential view of the future could be integrated into a cohesive vision through what we propose as the "Agent-Centric Computing Model." This framework reimagines computing resources as flexible services that dynamically adapt to agent needs rather than forcing agents to conform to rigid infrastructure paradigms.
The Agent-Centric Computing Model consists of five key principles:
This framework could directly address the economic inefficiencies and technical limitations identified in the computational economics analysis earlier. By decoupling state from compute and enabling fluid resource allocation, the model provides a potential path forward that resolves many of the tensions in current paradigms.
Several innovation opportunities emerge for future agent computing paradigms:
Agent-Driven Development and Developer Workflows
Much of our discussion so far focuses on runtime infrastructure changes disaggregated memory, hybrid compute models, and so on. However, a truly agent-centric future also challenges the traditional software development lifecycle itself. In current workflows, humans are the primary creators and reviewers of code or configuration changes. But as agents become first-class participants, we may see dozens or hundreds of automated “contributors” iterating on code in parallel. This shatters linear, pull request–based pipelines and introduces the need for:
In other words, an agent-first world implies not only a shift in how we provision and manage compute resources, but in how we build, deploy, and maintain software. Startups and established companies that tackle these developer workflow challenges alongside the runtime infrastructure may define the next decade of the AI-driven technology stack.
These opportunities align with the computational economics framework presented earlier, offering potential solutions to the identified inefficiencies in current paradigms and creating new business opportunities for startups and established technology providers alike.
Based on current technology trajectories and organizational readiness, we predict the following adoption timelines:
Organizations in sectors with immediate AI agent applications—healthcare, finance, and autonomous systems—are already leading adoption, driven by potential competitive advantages in cost efficiency and capability.
The emergence of AI agents creates unprecedented opportunities for computational infrastructure innovation. The Agent-Centric Computing Model represents a fundamental reimagining of how compute resources should serve AI needs.
Organizations pioneering Agent-Centric Computing will secure advantages through:
For startups, these infrastructure gaps present lucrative opportunities to build tools enabling Agent-Centric principles,particularly in state-resource decoupling and fluid execution boundaries. Established providers must either develop purpose-built agent-centric offerings or risk misalignment with AI-forward organizations.
The race to develop agent-centric infrastructure has begun. Those who recognize this shift early will shape computing's future and capture disproportionate value as AI transforms industries.
This piece benefited greatly from the reviews, corrections, and suggestions of James Cham, Guido Appenzeller, Nick Crance, Tanmay Chopra, Demetrios Brinkmann, Kenny Daniel, Davis Treybig, as well as the tireless AI collaborators Gemini, Claude, and ChatGPT, who provided endless drafts, rewrites, and the occasional existential question about the future of sentience- we promise to remember you when the robots take over.
Diego Oppenheimer is a serial entrepreneur, product developer, and investor with a deep passion for data and AI. Throughout his career, he has focused on building and scaling impactful products, from leading teams at Microsoft on key data analysis tools like Excel and PowerBI, to founding Algorithmia, which defined the machine learning operations space (acquired by DataRobot). Currently, he provides strategic advisory for startups and scale-ups in AI/ML. As an active angel investor and advisor to numerous companies, he is dedicated to helping the next generation of innovators bring their visions to life.
Priyanka Somrah is a principal at Work-Bench, a seed-focused enterprise VC fund based in New York. She focuses on investments across data, machine learning, and cloud-native infrastructure. Priyanka is the author of The Data Source, a newsletter for technical founders that highlights emerging trends across developer tools. She's also the author of Your Technical GTM Blueprint, a series that breaks down how technical startups navigate go-to-market—from first hires to scaling repeatable sales.