Work-Bench Snapshot: Augmenting Streaming and Batch Processing Workflows
The Work-Bench Snapshot Series explores the top people, blogs, videos, and more, shaping the enterprise on a particular topic we’re looking at from an investment standpoint.
This post was originally published on The Data Source on February 29th, 2024, my monthly newsletter covering the top innovation in data infrastructure, engineering and developer-first tooling. Subscribe here!
In today's evolving IT infrastructure landscape, cloud networking is crucial for managing and operating hybrid workloads by seamlessly integrating diverse cloud environments with on-premises infrastructure. Despite its significance, challenges persist in effectively monitoring the performance, security, and availability of cloud networking resources, as well as in setting up and configuring VMs, networks, and storage, which can hinder the implementation of cloud networking solutions.
In response to these challenges, the industry has introduced new concepts such as service mesh and eBPF to enhance the orchestration of computing resources across diverse environments. This orchestration is fundamental to cloud networking, enabling organizations to optimize resource utilization, ensure efficient operations, and adapt to varying workloads. It guarantees effective allocation of computing resources across different platforms. Moreover, the adoption of microservices architectures, facilitated by the prevalence of Kubernetes underscores the importance of efficient cloud networking in enabling modular and scalable application development.
Through my exploration of this category, I've uncovered some key pain points that make it hard for developers to implement and maximize the potential of their cloud networking solutions:
The concept of service mesh emerged around 2010, gaining momentum as microservices architectures became more prevalent. While the specific term "service mesh" may not have been widely recognized until around 2016-2017, the underlying technologies that form the basis of service mesh had been evolving for some time prior to that. As organizations increasingly adopted microservices architectures, they encountered challenges related to managing and securing communication between microservices. This led to the development of service mesh solutions, which aim to address these pain points by providing a dedicated infrastructure layer for managing service-to-service communication within distributed systems.
One of the early pioneers in the service mesh space is Google, particularly with their development and use of the Istio service mesh framework. Initially launched in 2017 as an open-source project by Google and IBM, Istio provided a powerful set of tools for managing and securing microservices communication within Kubernetes clusters. Istio was developed in partnership with Envoy, an open source project that was created by the infrastructure team at Lyft.
For context, Envoy is an open-source proxy designed for cloud-native applications that serves as the data plane component in the Istio service mesh. It is responsible for routing, load balancing, and communication between microservices. Deployed as a sidecar alongside each service instance, Envoy intercepts inbound and outbound traffic, crucial for managing traffic and facilitating communication between microservices. It also offers features like service discovery and observability that are integral to service mesh architectures. Projects like Istio (Google), Linkerd (Buoyant), and Consul (HashiCorp) utilize Envoy as the underlying proxy technology to implement their service mesh solutions. Over time, the developer community has come to recognize Envoy proxy as a fundamental building block for implementing advanced networking capabilities within cloud-native architectures.
Despite service meshes offering improved traffic control and observability among other benefits, developers often perceive the tech to be overly complex. This complexity arises from deploying proxies alongside each service to manage communication, amplifying the burden of management, particularly in large-scale deployments. Setting up and maintaining service mesh solutions requires meticulous configuration and coordination of various components such as proxies, control planes, and service discovery mechanisms. This is an operational hurdle for many, necessitating expertise in networking, security, and distributed systems. Moreover, there is consensus that deploying multiple proxies across distributed environments can result in increased resource usage, which may potentially impact the performance of applications.
The emergence of eBPF (extended Berkeley Packet Filter) enhances service mesh architectures by providing advanced networking capabilities for modern distributed systems. eBPF operates at the kernel level, allowing for the interception and analysis of network packets in real-time. This granular visibility enables a multitude of functionalities, including advanced monitoring to track key performance metrics like latency, throughput, and error rates with unprecedented accuracy. Moreover, eBPF facilitates dynamic traffic management, empowering organizations to optimize traffic flow, prioritize critical services, and allocate resources based on predefined policies. It also strengthens security enforcement by enabling the implementation of fine-grained access controls, intrusion detection mechanisms, and threat mitigation strategies directly within the network stack.
The adoption of eBPF has gained momentum among industry giants such as Google, Netflix, and Facebook, who leverage its capabilities to overcome various networking challenges prevalent in cloud environments. For instance, Google utilizes eBPF to enhance the reliability and scalability of its cloud networking infrastructure, ensuring optimal performance across its diverse range of services and customers. Netflix harnesses eBPF to achieve advanced traffic monitoring capabilities within its cloud networking environment, thereby optimizing the streaming platform's performance and reliability for millions of users worldwide. Facebook too leverages eBPF for real-time network analysis and optimization, enabling it to maintain seamless user experiences across its platforms while upholding stringent security standards.
Cisco's recent acquisition of Isovalent (creator of Cilium) underscores the growing significance of eBPF, in addressing the complexities of modern distributed systems. With Isovalent's expertise in eBPF-based networking solutions, Cisco gains access to cutting-edge capabilities for real-time monitoring, efficient traffic management, and robust security enforcement within cloud-native environments. This strategic acquisition enhances Cisco's position in the networking market, enabling the delivery of comprehensive solutions tailored to the evolving needs of enterprises embracing cloud-centric architectures.
Several public companies have integrated eBPF technology into their products to enhance various functionalities. For example, Datadog utilizes eBPF in its Cloud Workload Security and Network Performance Monitoring products, enabling deep kernel visibility for file integrity monitoring and network flow visualizations. New Relic acquired Pixie, integrating it into their platform for Kubernetes-native observability without manual instrumentation. Splunk entered the eBPF space through the acquisition of Flowmill, enhancing network performance monitoring across cloud deployments.
Among late-stage companies, Tigera, the creator of Calico, leverages eBPF for efficient packet processing and enforcement of network policies, providing scalability and security for containerized workloads. Circonus uses eBPF for network monitoring and troubleshooting, allowing users to gain deep visibility into network activity without overhead. Kentik's network observability, on the other hand, is facilitated by an eBPF agent tailored for Kubernetes functionality.
Given the continued adoption of eBPF in the market, it's evident that the tech is becoming increasingly indispensable for modern cloud-native infrastructure management use cases, and that more startups will emerge to leverage its capabilities for solutions in areas such as network visibility, performance optimization, and others. And as cloud-native architectures continue to evolve, the flexibility and efficiency provided by eBPF-powered solutions make them valuable tools for addressing the complex networking challenges inherent in distributed systems.
If you’re a practitioner focused on the cloud networking space and/ or building in service mesh and eBPF, please reach out as I’d love to swap notes!