|
This Presentation date is August 20, 2015 at 13:30 - 15:30.
Presenters: Andre Pech, Anshul Sadana, Hugh Holbrook, Jeff Raymond, Kamal Bakshi, Ken Duda, Ryan Madsen
Arista Networks presented at Networking Field Day 10, highlighting their innovative approach to data center networking. Anshul Sadana discussed Arista’s scalable solutions, including the “Spline” design, and emphasized network reliability and operational flexibility through automation and software programmability. Ken Duda focused on the evolution and quality of Arista’s EOS software platform, emphasizing culture, architecture, and testing. Ryan Madsen showcased the Arista EOS SDK, enabling customers to build applications that interact with the system database for real-time network management. Jeff Raymond presented CloudVision, Arista’s network-wide automation and orchestration solution, designed to bridge the gap between automated and manual IT infrastructures.
Arista Networks’ CloudVision is a centralized network management platform that leverages EOS and SysDB for real-time visibility and automation. It simplifies change management, enables zero-touch provisioning, and integrates with external systems for proactive operations. CloudVision’s architecture, built on a clustered system, ensures high availability and allows for both automated and manual control, making it suitable for agile and manageable network environments.
Arista Networks Overview
Watch on YouTube
Watch on Vimeo
Anshul Sadana, Senior Vice President of Customer Engineering, introduces Arista Networks and talks about their product lines and applications in modern networks.
In this presentation at Networking Field Day 10, Anshul Sadana delivers an in-depth overview of Arista Networks’ innovations and evolving approach to data center networking. He begins by describing the transition from traditional multi-tiered architectures to more streamlined one- and two-tier designs, such as Arista’s “Spline” design, which allows up to 2,000 non-blocking host connections. These scalable solutions are intended to meet the growing demands of modern enterprises and cloud data centers. Sadana also discusses their extensive support for Layer 2 and Layer 3 topologies, including large deployments of up to 220,000 servers using Layer 3 ECMP, and highlights VXLAN overlay capabilities for extending Layer 2 networks across data centers.
A key emphasis of Sadana’s talk is on network reliability and operational flexibility. He explains how Arista’s extensible operating system (EOS) and architectural philosophy enable high redundancy and efficient upgrades, avoiding the traditional outages associated with legacy systems. Arista’s pioneering Smart System Upgrade (SSU) allows for controller and operating system upgrades without downtime, even on single top-of-rack switches. The network has evolved from a focus on uptime to a focus on minimizing downtime, with automation playing a central role in provisioning, configuration management, monitoring, and decommissioning. Sadana outlines the benefits of such automation in large-scale environments and showcases Arista’s work on seamless integrations with tools like Splunk and custom telemetry platforms.
Finally, Sadana delves into Arista’s software programmability and the different strategies available for integrating with network infrastructure, culminating in Arista’s unique SysDB-based architecture. This allows for a high-performance, publish-subscribe model where multiple agents manage different features and states efficiently. He emphasizes Arista’s openness through robust APIs and SDKs, facilitating deep integrations and empowering customers to exert granular control over their networks. He also touches on Arista’s growth, now holding the number two market share in high-speed data center switching, and highlights their global footprint with expanded support centers and RMA depots. Overall, the session showcases Arista’s commitment to scalable, programmable, and customer-driven network innovation.
Personnel: Anshul Sadana
Arista Networks EOS Evolution and Quality with Ken Duda
Watch on YouTube
Watch on Vimeo
Ken Duda, Founder, CTO, and Senior VP of Software Engineering at Arista Networks, delivered a compelling presentation on the evolution and quality of Arista’s EOS software platform. Recorded on August 20, 2015, Duda emphasized that quality is the paramount concern for Arista, not just one of many priorities. He explained that the reliability of network software is crucial because when the network fails, everything else does too. From the inception of Arista, the company has focused on delivering a higher quality network experience by prioritizing quality through three main pillars: culture, architecture, and testing.
Duda elaborated on the cultural aspect of quality, highlighting that at Arista, quality is ingrained in the company’s ethos. Unlike some competitors who might incentivize hitting ship dates with bonuses, Arista does not reward employees for meeting deadlines if it compromises quality. This approach ensures that employees are not pressured to release products prematurely. Duda shared an anecdote about how the CEO, Jayshree Ullal, consistently supports delaying shipments to ensure the product is ready, thereby shifting the risk from the customer’s balance sheet to Arista’s income statement. This cultural commitment to quality extends to how the company handles customer issues. Technical engineers are empowered to directly contact technical leads to resolve problems without bureaucratic escalation, ensuring swift and effective responses to any issues in the field.
On the architectural front, Duda discussed how Arista’s use of Pure Linux and a system database approach contributes to the robustness of their software. By adhering to the Unix philosophy and avoiding kernel modifications, Arista maintains compatibility with the broader Linux community and ensures easier upgrades. The system database approach replaces traditional message passing, which can lead to synchronization and update rate mismatch problems. Instead, Arista’s architecture allows for state-oriented updates, coalescing changes to maintain system stability even under high-stress conditions. Finally, Duda addressed the importance of testing, advocating for the elimination of traditional QA teams in favor of automated testing. Each development team at Arista is responsible for the quality of their code, providing both the new code and the automated tests to prove its functionality. This rigorous testing framework has significantly reduced regressions and improved the overall quality of Arista’s software, ensuring that customers experience fewer issues and more reliable network performance.
Personnel: Ken Duda
Arista Networks EOS SDK Demo
Watch on YouTube
Watch on Vimeo
Ryan Madsen, Software Engineer, discusses the software development kit (SDK) for Arista Networks EOS platform and showcases examples of how to use the SDK to create software for EOS.
In his presentation for Networking Field Day 10, Ryan Madsen introduces Arista’s EOS SDK, a development toolkit that allows users to write applications that can natively interact with Arista’s system database. He explains that the SDK consists of a collection of modules and APIs — such as those for static routes, ARP entries, and port channels — which make it possible to both program and respond to network changes directly on the switch. Madsen points out that while the SDK is only about a year old, many customers have already begun deploying real applications, including video streaming services that manage real-time multicast traffic and network engineers experimenting with protocol implementation based on MAC learning events.
Madsen presents a demonstration highlighting how large customers use the SDK to perform advanced traffic engineering across multiple data centers. These customers build controllers that monitor expected bandwidth flows and assign path priorities. They use SDK-based agents installed on Arista switches to receive routing instructions from the controller, converting them into MPLS label stacks to direct traffic through specific paths. A key example demonstrates how the agent and controller react dynamically when a link goes down: the controller recalculates paths and pushes new routes to the agent, which promptly updates the hardware configuration on the switch. The SDK enables real-time decision-making and adjustments with native-level access, essentially allowing users to build their own custom MPLS implementations on Arista gear.
The SDK makes application development relatively simple and portable, supporting both C++ and Python languages, with consistent APIs that abstract hardware differences. Developers receive event-driven callbacks for network changes — like interface status updates — and use manager methods to retrieve or set state data. This abstraction shields them from low-level hardware complications, allowing the same binary application to run seamlessly on various Arista platforms, from smaller switches to 1,000+ port systems. Madsen concludes that this level of integration empowers users to respond immediately to network events and to build robust, state-aware applications. With ongoing SDK enhancements and community contributions via GitHub, Arista aims to foster further innovation from its user base.
Personnel: Ryan Madsen
Arista Networks CloudVision Overview
Watch on YouTube
Watch on Vimeo
Jeff Raymond, VP of EOS Software and Services at Arista, delivers an overview of the CloudVision offering with a focus on network-wide automation and orchestration, and its role within the broader SDN landscape. In the presentation at Networking Field Day 10, Raymond frames CloudVision as a pivotal product aimed at bridging the gap between highly automated, DevOps-style environments and more traditional, manual IT infrastructures. Arista recognizes a spectrum of network automation readiness in its customer base, ranging from DIY cloud giants with in-house tooling, to DevOps-driven organizations using scripts and tools like Ansible and Puppet, down to mainstream enterprises that still rely heavily on manual CLI configurations. CloudVision is designed to meet the needs of this last and largest group by offering turnkey automation solutions that simplify complex infrastructure management tasks.
CloudVision extends Arista’s EOS and leverages its SysDB database architecture, which provides real-time and historic state visibility across the network. By centralizing this data in a virtualized instance of EOS, CloudVision offers a holistic, network-wide platform that reduces the complexity involved in managing individual switches. This centralization also provides a consistent northbound interface for integration with third-party orchestration platforms and controllers. For example, rather than establishing individual OVSDB connections to each switch, a single connection via CloudVision can abstract and manage the entire infrastructure. This decouples the orchestration layer from hardware-specific configurations and drastically improves performance and scalability–Raymond cited a 10x improvement in MAC move detection speed as a result of this abstraction.
Moreover, CloudVision enhances operational efficiency by supporting automated network provisioning, simplified software upgrades, and robust change management processes. By capturing snapshots of the network at various points in time, the system allows operators to roll back to a known good state quickly, a feature particularly useful during maintenance. Change windows are streamlined with pre- and post-check comparison capabilities, reducing downtime and human error. CloudVision’s integration with external systems like ServiceNow and security platforms also enables proactive operations, such as notifying users of known bugs or vulnerabilities. Overall, Arista positions CloudVision not just as a controller or visibility tool, but as a comprehensive platform for evolving traditional network environments into agile, automated, and highly manageable infrastructures.
Personnel: Jeff Raymond
Arista Networks CloudVision Demo
Watch on YouTube
Watch on Vimeo
Andre Pech, Director of Software Engineering, demonstrates Arista Networks CloudVision automation and orchestration platform.
In this presentation, Andre Pech provides an in-depth demonstration of Arista’s CloudVision portal, showcasing its capabilities in network-wide automation and provisioning. CloudVision standardizes device configurations through a centralized network database, which defines the desired state of the entire network rather than relying on individual device-specific configs. This centralization allows for simplified change management and compliance through the use of logical containers and configlets, which allow configuration inheritance down a strict hierarchy. Pech emphasizes the system’s ability to enforce standard configurations at different logical levels (e.g., tenant, data center, availability zone) without duplicating effort or treating devices as unique snowflakes.
CloudVision also enables zero-touch provisioning by managing new devices out-of-the-box as they register themselves with the portal. The system allows users to define network changes in a staged, controlled manner, including approval workflows via a task management system. By leveraging EOS APIs such as EAPI and config sessions, CloudVision can efficiently apply full configurations in a transactional, all-or-nothing manner. This minimizes error-prone configuration diffs and ensures consistency, rollback capabilities, and high reliability throughout the change process. Additionally, integration with external API-driven systems—such as OpenStack, NSX, or ServiceNow—extends CloudVision beyond just network management into broader data center orchestration.
A key benefit of CloudVision is that it enhances management through automation without removing traditional manual control. Network engineers can still interact with individual devices via the CLI for urgent fixes or deep debugging, and any local rogue changes are detected through compliance checks and reconciled through the portal. The platform is built on a highly available clustered architecture, ensuring operational continuity even during controller downtime, with no impact on the functionality of the underlying network. As highlighted in the presentation, the CloudVision model focuses on managing the provisioning and orchestration of the network, rather than direct data plane control, making it highly practical and non-disruptive for real-world deployment.
Personnel: Andre Pech
Arista Networks 7500 Series Architecture
Watch on YouTube
Watch on Vimeo
Hugh Holbrook, VP of Software Engineering, gives an overview of the Arista Networks 7500E series spine switch with a focus on the system buffering architecture. In his presentation at Networking Field Day 10, Holbrook explains how the Arista 7500E’s virtual output queue (VOQ) architecture is designed to minimize network congestion and improve application performance in datacenter environments. He highlights the importance of buffers in managing traffic efficiently by comparing switches with large and small buffers. The 7500E uses a VOQ approach to distribute and balance traffic over the switch fabric by breaking packets into small cells that are evenly spread across multiple fabric chips and reassembled at the egress. This results in even utilization of the internal fabric and helps avoid bottlenecks and packet drops, a common issue in less advanced switch architectures.
Holbrook further explains the technical merits and the practical necessity of large ingress buffers in such switches. Each ingress chip is equipped with several gigabytes of external memory, allowing for half a million queues or more in a fully loaded system. This fine-grained buffering supports every output port and traffic class combination, which is essential in preventing head-of-line blocking. Through a series of graphs and simulations, he demonstrates how insufficient buffering leads to unequal bandwidth distribution among flows and significantly higher query completion times, particularly under high traffic loads typical of modern data center applications. The use of large buffers helps mitigate “bandwidth capture,” a state where a few flows monopolize network resources, resulting in unfair throughput distribution among competing streams.
The presentation concludes with real-world examples and simulation data that underscore the efficiency of the 7500E’s architecture. Notably, customers have reported up to 6x improvements in application response time after deploying Arista’s large-buffer switches. Holbrook addresses common concerns about buffer bloat—typically debated in the context of consumer-grade networks with slower links—and clarifies that in high-speed data center environments operating at 10, 40, or 100 Gbps, buffer sizes that seemed excessively large in the past are actually appropriately scaled in terms of latency. The Arista 7500E thus balances high throughput and low latency, making it a valuable asset in handling bursty traffic patterns and in supporting demanding enterprise and cloud applications.
Personnel: Hugh Holbrook
Arista Networks Leaf SSU Demo
Watch on YouTube
Watch on Vimeo
Kamal Bakshi, Technical Marketing Engineer, demonstrates Arista Networks Leaf SSU, covering the latest resilience features of EOS for highly-available spine/leaf designs.
In this presentation at Networking Field Day 10, Kamal Bakshi showcased Arista Networks’ Leaf Stateful Switch Upgrade (SSU) capabilities by performing a live software upgrade on an Arista switch without disrupting ongoing traffic. The demonstration involved a video stream and continuous pings passing through a single-homed switch that was about to be reloaded. Despite rebooting the switch, including wiping and reloading its software image, the data plane remained fully operational with no interruptions to the video or packet loss, evidencing the effectiveness of the hitless upgrade process. This level of availability is achieved through Arista’s architecture that separates the control and data planes, allowing the switch hardware to continue forwarding traffic even when the control plane is temporarily down.
Bakshi delved into the technical underpinnings of Arista’s Extensible Operating System (EOS), highlighting that it runs on top of an unmodified Linux kernel with a Fedora-based user space. Within this environment, all network protocols and hardware drivers are implemented as user-space agents that interact with EOS’s SysDB, which holds the entire system state. During the hitless upgrade, the control processes were gracefully stopped, and the EOS image was fully replaced while the forwarding ASICs and TCAM continued to maintain network operations. Once the upgrade was complete, the agents restarted and repopulated SysDB by learning the switch state anew, enabling a seamless return of control functionality without affecting data traffic.
The demonstration emphasized Arista’s modular and resilient switch architecture, enabling highly available networks that can undergo significant maintenance like OS reboots without traffic disruption. The hitless upgrade process includes pre-upgrade validation, image verification, and ensures full compatibility before proceeding, making it a safe and efficient mechanism. Bakshi also noted the flexibility to upgrade individual processes, such as STP, independently if needed. Through capabilities like NSF for BGP and intelligent hardware/software separation, Arista’s switches support both Layer 2 and Layer 3 resiliency. This validates their suitability for high-availability spine/leaf designs commonly used in data centers and large-scale network environments.
Personnel: Kamal Bakshi