Work-Bench Snapshot: Augmenting Streaming and Batch Processing Workflows
The Work-Bench Snapshot Series explores the top people, blogs, videos, and more, shaping the enterprise on a particular topic we’re looking at from an investment standpoint.
When considering the many ways to deploy code (Docker, Kubernetes, serverless platforms, etc.), the cloud ecosystem can be overwhelming and hard to keep up with. Ultimately, developers all want the same thing: reliable, safe, secure, and resilient deployments that are easy to build and iterate on at a predictable, reasonable cost.
However, there are trade-offs based on where you are in the software development life cycle (SDLC), your team expertise, and budget. The modern, popularized Kubernetes/Docker deployment model covers several of these bases by being reliable, highly scalable, well taught, and documented. This makes it easy to find a (usually expensive) dev-ops engineer to handle your containerized infrastructure. It can also be cost-stable, as the unit of the container has some traffic elasticity built in, but needs plenty of engineering support as most configuration and security overhead sits on the developer.
As developers, we hate solving the same problem twice. Teams spend countless hours building the same infrastructure, boilerplate, and cloud configuration/integrations that are so common they might as well be abstracted. Serverless tries to tackle this issue with a few key features:
Serverless is a catch-all term for a suite of services you can use to run code without managing deployments and infrastructure. In this article, we’ll focus on AWS’ serverless platform as it’s considered the market leader, beating out other offerings such as GCP, Azure, and Cloudflare. AWS offers a host of serverless services that can be plugged into each other: Lambda, DynamoDB, Step Functions, and AppSync to name a few. With Lambda, one of the most understood examples of the power of serverless, you can simply upload a function (e.g., a data processing trigger) and AWS handles the underlying infrastructure needed to get it deployed and running.
Cost is a tricky beast with serverless. Deployments can be cheaper for bursty workloads, but suffer from cost unpredictability. A standard EC2 instance has a fixed cost, leased on a fixed term or by time. Predictability is a great business advantage. An AWS Lambda, however, suffers from several cost skyrocketing gotchas:
A few real world anecdotes:
Cost matters to enterprises. Nobody wants to be the developer that racks up a $4k bill over a recursion bug, especially when it’s hard to actually test serverless code. More importantly, developers might steer clear from experimenting with Lambdas, reducing the market of devs that can build with them.
🔑 Takeaway: Cost unpredictability adds uncertainty to infrastructure, steering developers away from experimentation and enterprises away from large-scale adoption.
Given unpredictable costs, what makes serverless worth it? It’s pretty easy to get something small running. The code for a single Lambda just needs to be zipped and uploaded to AWS. No Kubernetes, no dependency management. The benefits of serverless are real, which is why according to Datadog’s State of Serverless Report over half of organizations operating in the major clouds have adopted serverless and over 60% of large organizations deploy Lambda functions in more than three languages.
However, just because serverless is easy to deploy does not mean it’s easy to use. When it’s scaled to the architecture of a full application, serverless can get pretty unruly:
Let’s zoom in to one part of this architecture, one which you have probably written or seen a hundred times in its non-serverless paradigm: a CRUD database API.
Notice that between the API Gateway and DynamoDB (both AWS services) sit two Lambdas: this makes the API subject to the foibles of Lambdas: latency (cold starts) and unpredictable costs. Using Lambdas as glue between services is a common pattern.
There is a better way: AWS lets you integrate services directly with each other, which comes with a host of benefits: lower latency, less code to maintain, and no operational maintenance (AWS is responsible for the integration running), all at no cost (as you avoid the extra Lambda).
But of course, there is a catch: boilerplate. AWS’ serverless offerings suffer from serious boilerplate, especially when integrating between services (like Step Functions and DynamoDB). Here’s what a simple AWS Step Function doing a DynamoDB lookup looks like:
Repetition. Endless, exhausting, and an anathema to the creativity we need to build elegant and fast things. This is an emerging issue: as more people start to take serverless seriously for deployments, the boilerplate demanded by simple things like creating a Lambda and complex things like service to service integrations will frustrate developers. This can lead to even experiments with AWS serverless costing teams weeks of developer productivity as they sift through and build on verbose boilerplates.
Serverless is also subject to vendor lock-in. We spent this article focusing on market leader AWS: as this tech matures, each vendor will expand their own suites of services and inevitably lock customers in as much as they can. This leads to developers having to learn esoteric scripting or configuration languages that are per platform. The lack of standardization in serverless could relegate it to prototyping for some time to come.
🔑 Takeaway: Serverless architectures are a win because they make the deployment experience easier. However today, that might mean making the development experience harder. For serverless to be enterprise ready, the development experience needs to be clean enough for developers to want to use it.
Who is responsible if it breaks? Or, whose job is it to keep it running? One of the flagship benefits of serverless is that you manage almost nothing beyond the code you write. It’s AWS’ job to keep your code running, keep packages updated, and you only need to intervene if your code breaks. Using service-to-service integrations helps minimize the maintenance burden even more, as it optimizes away glue code Lambas.
Lastly, some security is also up to you: developers must configure minimally permissive IAM policies in AWS, even for their serverless services.
In the history of computing, we’ve added abstraction layers to help developers focus on what they’re building rather than the glue in between. You can see this pattern in the shift over the years from rented physical hardware to virtual machines to Docker/Kubernetes, and we’re betting the next step is managed deployments like serverless.
We’ve established that the serverless developer experience needs work. Tools in the ecosystem have taken different approaches to fixing it. Some focus on the IDE/compiler side experience, for example building libraries that auto generate boilerplate and deduce configurations. They let developers build mostly their own way, without being too opinionated on how code should be written. Others take a different approach: you develop according to their framework and engine requirements, and reap the benefits of predefined and generated infrastructure which can sometimes be ported to any major cloud provider.
A shortlist of startups working in the space:
Cloud cost management is a saturated space. Most tools are reactive rather than proactive, forcing developers to damage mitigation rather than prevention. Some innovate with analysis features like forecasting pricing and anomaly detection.
A shortlist of startups working in the space:
If you found this interesting, agree, disagree, or have any comments, feel free to drop me a line at vc@work-bench.com! We’re always on the lookout for developed, fresh, and unique perspectives on emerging tech.