Work-Bench Snapshot: Augmenting Streaming and Batch Processing Workflows
The Work-Bench Snapshot Series explores the top people, blogs, videos, and more, shaping the enterprise on a particular topic we’re looking at from an investment standpoint.
This post was originally published on The Data Source on October 16th, 2023, my monthly newsletter covering the top innovation in data infrastructure, engineering and developer-first tooling. Subscribe here and never miss an issue!
Like cloud computing, edge computing is a significant paradigm shift that’s poised to be the next platform shift. This shift stems from the fundamental way in how data is processed and managed. Unlike the traditional cloud model, which relies on centralized data centers, edge computing involves processing data close to its source of generation.
There are two critical factors for the widespread adoption of edge computing: speed and cost-effectiveness. Businesses today are increasingly reliant on near-instantaneous processing of data. This is particularly true for applications that require real-time responses, such as IoT devices, autonomous systems, and immersive technologies like virtual and augmented reality. Edge computing addresses this need by enabling data processing to occur closer to the source, significantly reducing the time it takes for information to traverse back and forth between devices and centralized cloud servers.
This past month, I spent time digging into the world of edge computing to understand the top trends shaping the broader ecosystem, the challenges surrounding data management at the edge as well as areas poised for innovation. Some observations:
What stood out the most from my initial research is that caching as a technique is a crucial aspect of edge computing platforms. It stores frequently accessed data close to users, reducing the need for communication with distant servers and thus minimizing delays. This is important for applications like voice assistants and NLP, live broadcasting and streaming, and video conferencing that require real-time or low-latency interactions. Companies like Akamai, Cloudflare and Fastly have demonstrated the value of caching by baking it into their solutions to improve website performance and responsiveness.
However, there are challenges associated with caching. Given that it necessitates specialized software and hardware configurations to effectively handle the storing, retrieving, and expiring cached content, implementing caching is technically challenging. It is also a niche topic that requires domain expertise in distributed systems and hardware. Despite its complexity, caching is a basic requirement for running data and compute on the edge and is required for data intensive applications. This is why today, we see so many startups such as Readyset, Materialize, Polyscale and others building in the “caching” market. What’s interesting is that while caching might work as a product wedge for many, the key question remains: how and where will these caching startups potentially evolve in the next couple of years? This is the most intriguing to me and where I’m focusing my research.
As I get up to speed on this topic, here’s a compilation of resources that I’ve enjoyed:
So, you want to deploy on the edge? by Zak Knill
“If a user makes a request from Europe, and the apps run in US East, that adds an extra 100-150 ms of latency just by round-tripping across the Atlantic… Edge computing tries to solve this problem, by letting app developers deploy their applications across the globe, so that apps serve the user requests closer to the user. This removes a lot of the round-trip latency because the request has to travel less far before getting to a data center that hosts the app. …Edge computing sounds great for reducing response times for users, but the main thing stopping developers from adopting edge computing is data consistency.”
Making Shopify’s Flagship App 20% Faster in 6 Weeks Using a Novel Caching Solution by Ryan Ehrlich
“At Shopify, we use two different technologies for caching: Memcached and Redis. Redis is more powerful than Memcached, supporting more complex operations and storing more complex objects. Memcached is simpler, has less overhead, and is more widely used for caching inside Shop. While we use Redis for managing queues and some caches, we didn’t need Redis’ complexity, so we chose a distributed Memcached.”
Cloudflare on the Edge by Ben Thompson
“Most computing resources that run on cloud computing platforms, including serverless platforms, are created by developers who work at companies where compliance is a foundational requirement. And, up until to now, that’s meant ensuring that platforms follow government regulations like GDPR (European privacy guidelines) or have certifications providing that they follow industry regulations such as PCI DSS (required if you accept credit cards), FedRamp (US government procurement requirements), ISO27001 (security risk management), SOC 1/2/3 (Security, Confidentiality, and Availability controls), and many more… But there’s a looming new risk of regulatory requirements that legacy cloud computing solutions are ill-equipped to satisfy. Increasingly, countries are pursuing regulations that ensure that their laws apply to their citizens’ personal data. One way to ensure you’re in compliance with these laws is to store and process data of a country’s citizens entirely within the country’s borders.”
Kurt Mackey
Kurt is the co-founder and CEO of Fly.io. Fly is an application delivery cloud that enables developers to run their apps closer to the end user.
Pekka Enberg
Pekka is the co-founder and CTO of Turso. Turso is a database that enables developers to write, deploy and maintain highly distributed and performant apps.
Sahn Lam
Sahn is a software engineer at Discord and co-author of the System Design Interview Book Series.
Practitioners and startup builders, if this is an area of interest to you, please reach out to chat!