Work-Bench Snapshot: Augmenting Streaming and Batch Processing Workflows
The Work-Bench Snapshot Series explores the top people, blogs, videos, and more, shaping the enterprise on a particular topic we’re looking at from an investment standpoint.
This post was originally published on The Data Source on December 1st, 2023, my monthly newsletter covering the top innovation in data infrastructure, engineering and developer-first tooling. Subscribe here!
In this post, GitHub embraces Copilot, a move that acknowledges AI's transformative impact on developer workflows. AWS introduces S3 Express One Zone, targeting startups with promises of reduced latency over costs. And, catch a throwback post from Brendan Gregg as he explores the intricate challenges of adapting eBPF observability tools for security monitoring. I’m sharing a running list of articles that have caught my attention on the topics this week:
By Matt Rickard
GitHub recently said it was “re-founding” itself on Copilot instead of git. GitHub has always been about the workflow — there are plenty of other hosted git providers, but GitHub was the first to put together pull requests, issues, and collaboration into a single workflow. Re-founding on Copilot is a way to acknowledge that AI will drastically change the developer workflow.
GitHub is undergoing a significant transformation by shifting its focus from Git to Copilot, marking a pivotal moment that underscores the far-reaching impact of AI on developer workflows.
Today, developers are faster and more efficient at executing familiar tasks, thanks to features such as autocompleted code, AI-assisted code reviews, and the generation of AI-powered commit messages. These advancements not only streamline the software development process but also reimagine traditional developer workflows. It empowers developers to identify low-risk changes that can seamlessly merge without manual review. It automates conflict resolution and style issue checks, sparing developers from these routine tasks and allowing them to focus on more strategic aspects of their work.
But that's not all. In this post, Matt envisions a future where AI revolutionizes enterprise platforms, rendering them adaptable to diverse workflows. As a copilot, AI has the potential to equip non-technical users with the ability to autonomously generate specific programming languages and generic code, tailoring platforms to their own unique requirements. GitHub re-founding itself on Co-pilot introduces a completely fresh paradigm in software development and it’ll be interesting to see all the innovation that comes out of this shift.
By Jack Vanlightly
S3 Standard can be cheap and most definitely is highly durable. It’s Achilles heel is the high, unpredictable latency. Cheap, durable storage makes it the best place to store large volumes of data and many systems today already do that. However, the high latency is a problem and depending on the workload, data system builders must go through many hoops to integrate S3 into the architecture to benefit from the economical storage but dodging the latency bullet.
Fresh from re:Invent, AWS unveiled S3 Express One Zone, a new storage class that promises lower latency but comes with a hefty price tag.
The right cloud storage solution has to balance cost-effectiveness, durability, and low-latency. While S3 Standard offers affordable and durable storage and is an ideal solution for storing large volumes of data, its high and unpredictable latency has made it challenging to handle low latency workloads. Data infrastructure teams typically resort to implementing replicated, fault-tolerant caches atop S3 to meet their systems' low-latency demands. Does the advent of S3 Express One Zone signal the end of these replication layers? Not quite.
While the tech itself is revolutionary and is a leap toward the perfect cloud storage solution, the cost of deploying it will make it challenging for teams to adopt it. But as Jack points out, Express One Zone might find a niche among startups prioritizing time over expenses. They may be more inclined to embrace Express One Zone rather than layering S3 with replication mechanisms and building it all themselves. That said, I do think we are not far from a storage primitive that is durable, supports low latency and is cost-effective. It’s just a matter of time.
By Brendan Gregg
Observability tools are designed have the lowest overhead possible so that they are safe to run in production while analyzing an active performance issue. Keeping overhead low can require tradeoffs in other areas: tcpdump(8), for example, will drop packets if the system is overloaded, resulting in incomplete visibility. This creates an obvious security risk for tcpdump(8)-based security monitoring: An attacker could overwhelm the system with mostly innocent packets, hoping that a few malicious packets get dropped and are left undetected. Long ago I encountered systems which met strict security auditing requirements with the following behavior: If the kernel could not log an event, it would immediately halt! While this was vulnerable to DoS attacks, it met the system's security auditing non-repudiation requirements, and logs were 100% complete.
This is an excellent post from Brendan Gregg that explores the intricacies of utilizing eBPF observability tools for security monitoring. Integrating observability tools into security monitoring frameworks without proper adaptation is not the way to do it. eBPF observability solutions excel in minimizing overhead for safe production use during active performance monitoring. But the downside of this low overhead is that it can have serious implications such as exposing the system to security vulnerabilities.
The distinction between security monitoring and operational monitoring has to be made clear. Teams must recognize the necessity for dedicated tools tailored to the unique demands of security, rather than attempting to repurpose existing operational tools. This is a solid post by Brendan which highlights an exciting opportunity shaping up around eBPF-powered security products.
Practitioners and startup builders, if this is an area of interest to you, please reach out to chat!