Follow on Twitter using the following hashtags or usernames: #NFD40
Watch on YouTube
Watch on Vimeo
Brendan Gibbs and other leaders from Arista outlined the company’s comprehensive strategy for AI infrastructure. The presentation highlighted Arista’s Etherlink portfolio, which is optimized for 800G connectivity and anchored by the EOS operating system. Gibbs emphasized that a well-optimized network is no longer just plumbing but a critical component that can improve AI job completion times by 44% and maximize the utilization of expensive GPUs. The core value proposition centers on providing open, Ethernet-based standards that offer customers flexibility and choice without the proprietary lock-in often found in competitive solutions.
The technical framework of the discussion focused on the Four AI Fabrics that Arista supports: front-end, scale-out, scale-across, and scale-up. Front-end fabrics handle traditional data center workloads and inferencing, while scale-out focuses on back-end training for clusters reaching up to 100,000 GPUs. For larger deployments that exceed the physical limits of a single building, Arista’s scale-across technology utilizes deep buffering, encryption, and routing intelligence to link geographically dispersed data centers. Finally, the scale-up fabric represents the new frontier of Ethernet-based interconnects designed for memory coherency across XPU clusters.
Arista differentiates itself through a broad hardware portfolio and a commitment to industry-wide innovation. Gibbs detailed the concept of hierarchical hybrid buffering, which combines on-chip shallow buffers for low latency with on-package deep buffers to prevent packet loss during congestion. Beyond hardware, Arista remains a leader in defining open standards, having contributed significantly to the OSFP MSA, the Ultra Ethernet Consortium, and the OCP. By offering modular and fixed platforms that support various architectural dreams, Arista aims to provide high-scale, low-power networking that remains fully interoperable and vendor-neutral.
Personnel: Brendan Gibbs
Watch on YouTube
Watch on Vimeo
We go deep into the fabric of the AI cluster. We’ll discuss why Ethernet has become the definitive backplane for AI workloads. We’ll explore hardware innovations in power efficiency and the protocol optimizations–like Dynamic Load Balancing (DLB) and advanced congestion control–that keep data moving at the speed of thought. This section will cover different networks for AI networking from scale-up to scale-out and scale-across, and discuss optimizations and enhancements to Ethernet standards such as UEC and E-SUN for AI applications.
Tom Emmons emphasizes that as AI networks become business-critical, quality and power efficiency are the primary drivers of architectural decisions. Every problem in an AI network escalates immediately because of the massive financial investments involved, making a reliable network essential. Since power is the fundamental limiting factor for GPU density in a data center, Arista focuses on reducing the network power footprint, ideally to less than 10% of total facility power, through liquid cooling, low-power optics, and high-radix switches that minimize the number of tiers. By reducing tiers, operators save on optics, which are the largest contributors to network power consumption, while also simplifying load balancing and reducing potential congestion points.
The presentation identifies four distinct AI fabrics: front-end, scale-out, scale-across, and scale-up. While scale-out provides the essential east-west connectivity for GPU training, scale-across is becoming increasingly vital for customers who must link geographically dispersed buildings to overcome local power and space constraints. Scale-across networking leverages Arista’s extensive experience in WAN and routing, utilizing deep buffers, encryption, and traffic engineering to manage latency and protect data. Meanwhile, the front-end network mirrors traditional data center designs but demands higher reliability and security to manage the billions of dollars in hardware it connects to the world and local storage resources.
Arista is a vocal advocate for Ethernet as the universal backplane, specifically for the emerging scale-up market where GPU-to-GPU memory copies occur. Through leadership in consortiums like the Ultra Ethernet Consortium (UEC) and the Ethernet for Scale-up Networks (ESun) workgroup, Arista is refining Ethernet to handle 256-byte cache line transactions and packet spraying more efficiently. Emmons posits that the dominance of Ethernet is driven by the industry’s desire for multi-vendor ecosystems and a unified management model. By running a single EOS image across all four fabric types, Arista provides a mature, tested software stack that allows operators to use the same BGP stack and telemetry tools regardless of whether they are managing a local scale-up cluster or a global scale-across network.
Personnel: Tom Emmons
Watch on YouTube
Watch on Vimeo
Optics have been critical for network switches especially at 800 Gbps and evolving to 1.6T from cost, density, power efficiency and performance perspective. Arista has been driving major innovations in optics to power networking for AI applications. In this section, we will cover the optics landscape and highlight the evolution to high-density 1.6T optics for AI networking.
Arista identifies power, cost, and reliability as the three pillars of optical innovation. While the OSFP form factor has been incredibly successful, projected to reach 100 million modules in 2026, it is reaching its thermal and density limits at approximately 35 watts. To meet the demands of next-generation AI networks, Arista introduced XPO, or extra dense pluggable optics. This new form factor is designed for 1.6T and beyond, offering a 4x improvement in system-level density compared to OSFP. By utilizing a two-tier card design and liquid cooling, XPO can handle thermal loads of up to 400 watts. This shift to liquid cooling not only manages heat but also significantly improves reliability by keeping optical components at lower, more stable temperatures, effectively reducing failure rates by five to eight times.
Beyond density and cooling, the evolution of optics includes a push for power-efficient architectures like Linear Pluggable Optics (LPO), which can reduce power consumption by 60% by eliminating internal retiming. Arista is also exploring Co-Packaged Optics (CPO) as a complementary solution, though XPO currently offers broader versatility across the various reaches required for scale-up, scale-out, and scale-across fabrics. While CPO is compelling for high-density, short-reach DR optics, it faces challenges regarding universal reach and a maturing supply chain. XPO, conversely, supports the full spectrum of connectivity from short-reach copper and fiber to long-distance coherent optics like ZR and ZR+, which are essential for the IP-over-DWDM architectures used in scale-across regional networks.
Arista remains committed to an open ecosystem, having established XPO as an open multi-source agreement (MSA) with over 100 partners, including major module vendors and system competitors like Cisco and Nokia. This open approach ensures a robust supply chain and allows customers to avoid proprietary lock-in while transitioning to liquid-cooled, high-density environments. The transition to XPO also enables data center operators to shrink the footprint of their networking racks, freeing up more physical space and power for revenue-generating GPU compute. As speeds advance toward 1.6T and 3.2T, the combination of denser form factors, advanced liquid cooling, and open standards will be the foundation for sustaining the growth of generative AI infrastructure.
Personnel: Vijay Vusirikala
Watch on YouTube
Watch on Vimeo
Monitoring and managing complex AI infrastructure requires moving beyond traditional networking tools that treat the environment as a black box. Praful Bhaidsana explains that the industry has long suffered from a mean time to truth problem where network operators are blamed for issues they cannot properly diagnose because they lack visibility into what is connected to the network. Arista aims to change this Stone Age approach by evolving from simple monitoring to 360-degree observability. This strategy is centered on CloudVision, a NetOps platform that utilizes a common network data lake called NetDL to aggregate high-fidelity streaming telemetry from every Arista device across the data center, campus, and WAN.
The architecture relies on the fact that Arista’s EOS provides consistent, reliable state data, ranging from MAC address tables and routing updates to microburst signals and configuration changes. This information is stored in a time-series database, allowing operators to travel back in time to compare network states before and after an incident. To manage the resulting deluge of data, Arista employs an AI/ML engine known as AVA, or Autonomous Virtual Assist. AVA identifies patterns and anomalies, filtering out the noise to show only the relevant signals. This allows human operators to focus on making informed decisions rather than spending hours manually correlating events across different silos.
Furthermore, CloudVision has opened its ecosystem to ingest data from third-party systems, AI job orchestrators, and compute and storage metrics via Prometheus. This integration is critical for AI environments where a job stall could be caused by anything from a GPU failure to a NIC issue. Arista has introduced a dedicated AI jobs dashboard that correlates specific training jobs with the underlying flows, servers, and switches. To simplify interactions with this massive dataset, a digital virtual assistant allows users to query their infrastructure using natural language. This integrated approach ensures that expensive GPU resources do not sit idle and that the resolution of complex performance bottlenecks can happen in minutes rather than days.
Personnel: Praful Bhaidasna
Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.
We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.
Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!