Work-Bench Snapshot: Augmenting Streaming and Batch Processing Workflows
The Work-Bench Snapshot Series explores the top people, blogs, videos, and more, shaping the enterprise on a particular topic we’re looking at from an investment standpoint.
This post was originally published in Polyglot’s monthly newsletter on May 5th, 2023. Subscribe here and never miss an issue!
Despite the many databases available on the market, developers still confront a host of challenges when interacting with them like issues with data capacity or cost management. If a developer provisions databases for too many workloads, they overspend and if they underestimate capacity needs, they risk failure. As a result, engineering teams are often forced to estimate demand for their applications and monitor their database infrastructure manually.
Serverless* databases solve for this by autoscaling. Because serverless databases scale automatically and elastically, developers don’t need to monitor the database load or capacity, which saves them time. Serverless databases also abandon the model of paying for pre-provisioned clusters and instead, leverage consumption-based pricing. With serverless, hobbyists and companies only pay for the resources that they use.
Suddenly, it seems like every database offering has gone “serverless.” But, these offerings are not all positioned in the same way (nor are they identical). AWS’ serverless version of Aurora was one of the first serverless Postgres offerings. Cockroach Labs’ (a Work-Bench Portco) serverless offering is positioned as a Postgres multi-region serverless platform with a rights scalable system that doesn't force users to manually shard.
Emerging players like Neon and Xata also offer serverless databases. Neon is open source and modeled on Aurora’s architecture, while Xata caters to a broader audience with its Airtable-like UI.
*Serverless = A misnomer. There are still servers. The server management is just abstracted away. The concept of serverless can apply to databases. Serverless databases are architected to autoscale and therefore leverage usage-based pricing. Learn more about serverless databases from MongoDB.
Trends in serverless databases:
Within Online Transaction Processing (OLTP) databases, select serverless relational databases include: Neon, PlanetScale, Cockroach Labs, and Xata. Serverless non-relational databases include: MongoDB, FaunaDB, and SurrealDB. The public cloud vendors have their own serverless database products like AWS Aurora, Google Cloud’s AlloyDB, and Azure CosmosDB.
There are missing economies of scale in the serverless pricing model. v2 of serverless may be more expensive than v2 with traditional architecture. Serverless is great for v1 players like hobby projects or startups in the pre-scaling, pre product-market fit phases. But when a hobby project or business starts getting consistent traction, costs may soar. Lee Robinson from Vercel talks about the tradeoff between serverless’ usage-based model and risk of failure in its absence (his Tweet might have also alluded to Vercel’s markup on Neon and Upstash). In short, a serverless database is a great fit for a business that may go viral, but not in a consistent way.
Here are several resources that informed these trends. I’d recommend checking them out.
2023 State of Databases for Serverless and Edge by Lee Robinson, Vercel
“More databases are embracing serverless, but what “serverless” means to them varies. There are different vectors of autoscaling: connections, storage, compute, and more.”
Open Source Startup Podcast: Building Scalable Postgres with Serverless Database Platform Neon with Nikita Shamgunov, Amanda Robson, and Timothy Chen
“If you never run out of storage, then you most likely overprovisioned storage…then you’re paying for not what you’re not using. Clouds are very expensive now…The only way to mitigate the expense of the cloud is to never pay for what you’re not using.”
Building a database in the 2020s by Ed Huang
“Serverless, as many people think of it, is a technical term, but I think it's not. Serverless is more about defining what a better product on the cloud from a user experience perspective is. Or maybe that's the way it should be: why should user[s] care about how many nodes you have? Why do I need to care about your database's internal configuration? Why do I have to wait for another half hour after I click launch?
If you’re building in serverless or generally thinking about this space, I’d love to chat with you. You can find me on Twitter @ranikubersky. Many thanks to Kelley Mak and Priyanka Somrah for their contributions to this piece.