Videos

Selector 2026 Roadmap and Q&A

Event: Networking Field Day 40

Appearance: Selector AI Presents at Networking Field Day 40

Company: Selector AI

Video Links:

Personnel: Reza Koohrangpour, Varija Sriram

Selector AI introduces an AI-powered network observability platform designed to unify multi-domain signals into actionable root cause analysis (RCA). The platform focuses on reducing the Mean Time to Repair (MTTR) by addressing the alert storm and fragmented data silos that plague modern network operations centers (NOC). By employing a three-layered approach of collection, correlation, and collaboration, Selector AI transforms thousands of disconnected telemetry signals into a single, cohesive incident report.

The platform distinguishes itself through a horizontal data lake architecture that utilizes an Extract, Load, Transform (ELT) model, preserving critical context and timestamps across various domains such as cloud, SD-WAN, and infrastructure. During the demonstration, Sriram illustrated a typical day in the life of a NOC operator using Selector’s ChatOps and Agentic Copilot features. When a financial application in AWS became unreachable from Tokyo, the platform correlated synthetic probes, SNMP data, and optical link degradation into a single Slack alert. This allowed the operator to visualize the specific failing hop, a cloud gateway router, and understand the business impact without needing to manually pivot between multiple disparate monitoring tools.

Selector AI’s technical core relies on a Kubernetes-based microservices architecture and a sophisticated AI/ML stack that distinguishes between simple correlation and true causation. The system uses self-supervised and unsupervised learning to establish baselines and detect anomalies across more than 300 telemetry sources, including active and passive synthetic probes. A standout feature is the integration of a Large Language Model (LLM) via a Copilot, which allows operators to perform root cause analysis and receive recommended remediation steps using plain English queries. The roadmap includes expanding visibility into AI workloads and GPUs, while the current platform offers bi-directional integration with ITSM tools like ServiceNow and Jira to ensure that all insights and manual notes are synchronized across the organization’s existing workflows.


Selector Platform Diagnosis Demo

Event: Networking Field Day 40

Appearance: Selector AI Presents at Networking Field Day 40

Company: Selector AI

Video Links:

Personnel: Varija Sriram

Selector AI introduces an AI-powered network observability platform designed to unify multi-domain signals into actionable root cause analysis (RCA). The platform focuses on reducing the Mean Time to Repair (MTTR) by addressing the alert storm and fragmented data silos that plague modern network operations centers (NOC). By employing a three-layered approach of collection, correlation, and collaboration, Selector AI transforms thousands of disconnected telemetry signals into a single, cohesive incident report.

The platform distinguishes itself through a horizontal data lake architecture that utilizes an Extract, Load, Transform (ELT) model, preserving critical context and timestamps across various domains such as cloud, SD-WAN, and infrastructure. During the demonstration, Varija Sriram illustrated a typical day in the life of a NOC operator using Selector’s ChatOps and Agentic Copilot features. When a financial application in AWS became unreachable from Tokyo, the platform correlated synthetic probes, SNMP data, and optical link degradation into a single Slack alert. This allowed the operator to visualize the specific failing hop (a cloud gateway router) and understand the business impact without needing to manually pivot between twelve or more disparate monitoring tools.

Technically, Selector AI leverages a Kubernetes-based microservices stack and a sophisticated causation model that separates simple correlation from true underlying causes. The integration of Gemini-powered Large Language Models (LLMs) allows users to query the system in plain English to receive summaries and recommended action plans, such as contacting specific service providers or triggering automated remediation workflows like port resets. The platform also offers bi-directional integration with ITSM tools like ServiceNow and Jira, ensuring that all AI-driven insights and manual operator notes are synchronized across the organization’s existing workflow management systems.


One View, Total Clarity with Selector

Event: Networking Field Day 40

Appearance: Selector AI Presents at Networking Field Day 40

Company: Selector AI

Video Links:

Personnel: Reza Koohrangpour, Varija Sriram

Selector AI is an AI-powered network observability platform that unifies signals across multi-domain environments to provide intelligent outcomes and root cause analysis (RCA). The platform helps networking and infrastructure operation teams lower their Mean Time to Repair (MTTR) and improve operational efficiency by addressing the challenges of fragmented data and tool sprawl. Through a three-layered approach of collection, correlation, and collaboration, Selector AI transforms raw telemetry into actionable insights, delivered through common communication tools like Slack and Microsoft Teams.

The presentation emphasizes that modern enterprises are overwhelmed by data silos and dashboard spiral, often utilizing 12 to 15 different observability tools that produce thousands of disconnected alerts. Reza Koohrangpour explains that Selector AI solves this by employing a horizontal data lake architecture that is vendor and domain-agnostic. Unlike traditional systems that use Extract, Transform, Load (ETL), Selector uses an Extract, Load, Transform (ELT) model. This approach preserves vital timestamps and source context, which is critical for correlating events across different domains, such as networking, cloud, and applications, to ensure that engineers see a unified timeline of an incident rather than fragmented pieces of a puzzle.

The platform’s technical core relies on a Kubernetes-based microservices architecture and a sophisticated AI/ML stack that distinguishes between simple correlation and true causation. The system uses self-supervised and unsupervised learning to establish baselines and detect anomalies across more than 300 telemetry sources. A standout feature is the integration of a Large Language Model (LLM) via Copilot, which allows operators to perform root cause analysis using plain English queries. Varija Sriram highlights that successful deployment relies on a Customer Success model, where Selector AI collaborates with clients to map metadata and business logic, ensuring the AI reduces noise and provides explainable results rather than black box answers.


Netris Day Zero Demo with Alex Saroyan

Event: Networking Field Day 40

Appearance: Netris Presents at Networking Field Day 40

Company: Netris

Video Links:

Personnel: Alex Saroyan

This presentation provides a technical demonstration of the Netris controller’s capabilities in automating and simulating large-scale AI networking environments. The demonstration highlights the platform’s Day Zero functionality, where network engineers can model complex topologies and validate designs before physical hardware even arrives. By utilizing Terraform and the Netris CloudSim, Saroyan illustrates how a controller can generate a digital twin of a 64-GPU cluster, defining inventory, topology, and IP address management (IPAM) for both front-end and back-end fabrics. This proactive approach allows teams to export cable maps and labels in advance, ensuring that the physical rollout is guided by a pre-validated, consistent architectural model.

The simulation phase reveals a distributed architecture where Netris agents are deployed to each virtual switch to handle configuration generation and real-time monitoring. Saroyan emphasizes that the central controller does not store or push static configuration files. Instead, the agents autonomously generate and apply the necessary code based on the modeled intent. This object-oriented approach eliminates the need for traditional configuration backups, as a replaced switch can simply re-generate its own configuration upon joining the network. The demo shows the system identifying wiring inconsistencies and automatically configuring BGP sessions and LACP bonding, effectively transforming raw hardware into a functional multi-tenant fabric with high visibility into the health and connectivity of every link.

In the final portion of the demonstration, Saroyan showcases the platform’s multi-tenant consumption model and its advanced integration with NVIDIA Bluefield DPUs. Using a server cluster template, he demonstrates how a consumer can provision isolated bare-metal environments (VPCs) with overlapping IP addresses in minutes, while the network engineer maintains control over the underlying L2/L3 VPN constructs. The demo concludes with a physical lab test involving hardware-accelerated host-based networking. By extending the EVPN-BGP control plane directly to the DPU, Netris enables seamless, wire-speed communication between bare-metal servers and virtual machines. This comprehensive automation workflow, extending from initial modeling to elastic IP provisioning and hardware-accelerated virtualization, illustrates Netris’s ability to simplify the entire lifecycle of high-performance AI infrastructure.


Consuming AI Networks with Netris

Event: Networking Field Day 40

Appearance: Netris Presents at Networking Field Day 40

Company: Netris

Video Links:

Personnel: Alex Saroyan

This presentation focuses on the consumption model of AI networks, specifically helping network engineers enable self-service capabilities for AI factory and neocloud operators. Alex Saroyan argues that while network engineers manage complex physical infrastructures, the consumers, such as compute orchestration products, require a simplified cloud-like abstraction. Netris achieves this through its Network Automation, Abstraction, and Multi-tenancy (NAAM) model, which introduces familiar cloud constructs like Virtual Private Clouds (VPCs) and VNets (subnets) to hide the underlying complexity of diverse fabrics like Ethernet and InfiniBand. This allows users to isolate GPU servers and define connectivity through high-level requests while the Netris controller automatically orchestrates the necessary configurations across front-end, back-end, and rack-scale fabrics.

The session also addresses the intricacies of host-level networking and granular multi-tenancy, particularly through the use of Data Processing Units (DPUs). Saroyan explains that traditional VLAN sub-interfacing is often insufficient for bandwidth-intensive AI workloads like KV caching. Therefore, Netris supports hardware-accelerated isolation directly on the DPU. By integrating DPU control planes with leaf switches via EVPN-BGP, Netris creates a unified fabric where virtual functions on a host and physical switch ports can coexist in the same VPC. This complete integration approach prevents the scaling issues associated with disconnected DPU overlays and allows for flexible, hybrid environments where diverse endpoint types, like bare-metal servers and virtual machines, can communicate securely at wire speed.

The presentation also details how Netris handles shared services and internet connectivity through constructs like VPC peering, Direct Connect, and the proprietary SoftGate technology. SoftGate is a horizontally scalable, multi-tenant software gateway that provides NAT, Layer 4 load balancing using the Maglev algorithm, and DHCP without sharing state, mirroring the internal architectures of major hyperscalers. To secure these environments, Netris implements VPC-aware ACLs that are intelligently placed by an algorithm to conserve limited TCAM resources. By dynamically analyzing routing tables, the system decides whether to enforce security rules at the SoftGate for internet traffic, on physical switches for inter-fabric traffic, or directly on DPUs, ensuring optimal performance across the entire AI infrastructure.


Netris and the Lifecycle of AI Networking

Event: Networking Field Day 40

Appearance: Netris Presents at Networking Field Day 40

Company: Netris

Video Links:

Personnel: Alex Saroyan

Alex Saroyan, CEO and co-founder of Netris, provides insights from the company’s experience in deploying and automating large-scale GPU clusters. This second part of the presentation focuses specifically on the life cycle of AI networking, emphasizing that sustainable AI business strategies require architecting for long-term growth and newer GPU generations rather than single-cluster deployments. Saroyan highlights the high financial stakes of these deployments, noting that organizations cannot afford play time once hardware arrives; the networking infrastructure must be ready to go live immediately to start generating revenue.

To meet these aggressive timelines, Netris utilizes a sophisticated modeling and simulation phase, often referred to as a digital twin, using technologies like NVIDIA’s DSX Air or Netris’s own CloudSim. This approach allows network engineering teams to model the topology, IP addressing, and cloud constructs, such as VXLANs and VRFs, before the physical hardware is even on-site. By pre-validating configurations in a simulated environment, teams can identify missing upstream connections or storage integration issues early. Once the hardware arrives, a Zero Touch Provisioning (ZTP) process identifies switches via MAC addresses and brings them into the master topology, where the Netris controller automatically generates and applies the necessary configurations without engineers having to write manual switch code.

The Netris platform further streamlines the deployment life cycle through automated validation and troubleshooting. Since the system operates based on intent-generated configurations rather than static files, the Netris agent on each node can instantly detect miswiring or link discrepancies and provide specific instructions for remediation. Saroyan explains that while human error in configuration is largely eliminated by the automation, hardware or control plane anomalies can still occur at scale. Consequently, Netris continues to expand its suite of smart tests to help engineers identify zombie switches or performance bottlenecks, providing a green status validation that allows GPU implementation teams to begin their work with total confidence in the underlying fabric.


Netris Introduction and Overview with Alex Saroyan

Event: Networking Field Day 40

Appearance: Netris Presents at Networking Field Day 40

Company: Netris

Video Links:

Personnel: Alex Saroyan

CEO and co-founder Alex Saroyan discusses the evolution of network engineering in the era of AI. Saroyan highlights that AI networking significantly differs from traditional data center networking due to the massive scale of GPU clusters, ranging from 1,000 to over 50,000 GPUs, and the sheer density of network switches involved. Netris introduces the concept of Network Automation, Abstraction, and Multi-tenancy (NAAM), a technology category designed to succeed SDN and intent-based networking. Unlike previous models that relied on software-based overlays, NAM is engineered for physical, high-performance fabrics where massive traffic volumes make traditional SDN impractical. The presentation emphasizes that Netris has gained significant traction as the first NVIDIA-validated network automation vendor, serving a growing neocloud market of AI factory operators.

The complexity of AI networking is illustrated through the cake of networking layers, which includes front-end, back-end (East-West), and scale-up (NVLink) fabrics. Saroyan explains that even a relatively small cluster of 512 GPUs involves over 1,200 network links, making manual configuration or “fire and forget” approaches impossible. A critical requirement for these operators is multi-tenancy, not just for security, but for operational maintenance. Since operators cannot access tenant data, they must have the ability to dynamically move servers between a tenant environment and a service environment to perform repairs or OS reinstalls. To manage this, Netris utilizes a distributed architecture with agents residing on switches and DPUs, allowing for scalable, consistent, and validated configurations across thousands of nodes without the bottlenecks of centralized automation tools like Ansible.

End-to-end orchestration in a Netris environment covers diverse hardware and protocols, including Ethernet, InfiniBand, and specialized RDMA-over-Converged-Ethernet (RoCE) setups. Netris maintains a vendor-neutral stance, supporting NVIDIA, Arista, and Broadcom-based Sonic switches, as well as integrating with NVIDIA’s UFM for InfiniBand and partitioning NVLink for rack-scale isolation. The session concludes by emphasizing that NAM acts as a bridge between compute orchestrators, such as Kubernetes, CloudStack, or NVIDIA Nico, and the underlying physical network. By providing robust API-level collaboration, Netris enables network engineers to design, simulate, and troubleshoot massive AI fabrics, ensuring that the physical infrastructure can keep pace with the dynamic demands of AI compute and multi-tenant resource allocation.


Upscale AI’s Point of View with Aravind Srikumar

Event: Networking Field Day 40

Appearance: Upscale AI Presents at Networking Field Day 40

Company: Upscale AI

Video Links:

Personnel: Aravind Srikumar

Upscale AI distinguishes between two critical domains: scale-up networking, which creates a large compute environment within a rack where multiple GPUs see a flat, unified memory, and scale-out networking, which connects these domains through memory copy operations. The presentation highlights that the network has become the backplane of a distributed ecosystem, moving from a standard client-server model to a highly synchronized all-to-all communication pattern. Upscale AI aims to solve the challenges of this new era by providing purpose-built hardware and software that prioritizes predictable, ultra-low latency and zero-oversubscription bandwidth to prevent computational stalls.

In the scale-up domain, the architecture must support load-store operations with latencies under one microsecond. Aravind Srikumar, introduces the Skyhammer architecture, a clean-sheet design specifically built for the scale-up environment that emphasizes “performance, performance, performance.” Unlike traditional networking, these systems utilize lightweight, optimized headers and offload congestion handling, such as Link Layer Retry (LLR) and Priority Flow Control (PFC), directly to the switch to minimize jitter. For scale-out needs, Upscale AI has partnered with NVIDIA to utilize the Spectrum-X substrate, building open, Ethernet-based systems around it that feature AI-optimized operating systems, hitless upgrades, and specialized circuitry for real-time power management and telemetry.

The company’s overarching vision is to enable a future of heterogeneous compute where customers can mix and match various processing units, such as GPUs, LPUs, and DPUs, without being locked into a single proprietary ecosystem. By utilizing open standards like SONiC, ESON, and UA Link, Upscale AI ensures that its fabric remains technology-agnostic and interoperable. This approach is designed to protect customer investments over a five-to-seven-year lifecycle, allowing the network to adapt as new AI workloads and specialized chips emerge. Ultimately, the goal is to transform the data center into an efficient token factory where every ounce of power and compute is maximized through an architected, rather than merely tuned, networking stack.


Upscale AI and the AI ASIC Landscape

Event: Networking Field Day 40

Appearance: Upscale AI Presents at Networking Field Day 40

Company: Upscale AI

Video Links:

Personnel: Deepti Chandra

Upscale AI posits that traditional data center networking is a round peg in a square hole for AI, as existing infrastructures were designed for general-purpose web traffic rather than the massive, synchronized communication required by billions of parameters and trillion-token models. By focusing exclusively on AI traffic and removing the bloat of legacy enterprise features, Upscale AI aims to provide a reliable, predictive substrate that treats the entire cluster as a single, coordinated engine.

The technical strategy addresses the evolution of AI workloads from dense training to inference-centric agentic AI and persistent states. As models outgrow the memory capacity of single GPUs, parallelism, like slicing math problems across thousands of processors, becomes mandatory, creating a massive data movement problem. Upscale AI advocates for a distributed ecosystem where the network must be technology-agnostic to support a plethora of specialized ASICs, including GPUs, TPUs, and custom hyperscaler XPUs. This architecture moves away from reactive TCP-based recovery toward a lossless, RDMA-driven environment where the network proactively manages congestion to prevent computational stalls, ensuring that every GPU cycle is utilized to maximize tokens per watt.

To future-proof these investments, Upscale AI is developing a portfolio of scale-up and scale-out systems built on open standards like Ethernet, SONiC, and UA Link. Their scale-out systems leverage a partnership with NVIDIA’s Spectrum-X, while their scale-up innovation involves purpose-built silicon and trays to support heterogeneous compute and memory pooling. By utilizing a unified software stack based on SONiC, the company provides a common substrate that simplifies operational onboarding for neoclouds and enterprises. Ultimately, Upscale AI’s mission is to move beyond the current homogeneous nature of AI clusters, providing an open, standards-based fabric that allows diverse hardware to interoperate seamlessly for the next decade of AI innovation.


Upscale AI Networking – What Has Changed with AI

Event: Networking Field Day 40

Appearance: Upscale AI Presents at Networking Field Day 40

Company: Upscale AI

Video Links:

Personnel: Aravind Srikumar

Upscale AI argues that traditional cloud and front-end networks, which are largely based on a client-server architecture, are fundamentally ill-suited for the unique demands of AI workloads. While standard web traffic is connection-oriented and tolerant of latency, AI clusters rely on collective communication where GPUs perform synchronized all-to-all data exchanges. This shift results in a move from north-south traffic patterns to intense east-west traffic, where a single request triggers massive bursts of data across the fabric. The presentation establishes that to maintain efficiency, the network must evolve from a reactive system to an architected substrate that treats the entire cluster as a single, coordinated engine.

AI networking requires a radical departure from the traditional OSI seven-layer processing model. In a standard network, packets traverse the full stack and are processed by the CPU;. However, AI traffic utilizes RDMA (Remote Direct Memory Access) to bypass the kernel and CPU entirely, performing zero-copy memory transactions directly between GPUs. This creates a different packet profile where payload data is memory itself rather than application data. Furthermore, while cloud networks handle congestion reactively through TCP retransmits, AI clusters require a lossless environment. In these systems, a single dropped packet can stall thousands of GPUs, leading to a computational head-of-line blocking that halts progress across the entire token factory.

To solve these challenges, Upscale AI advocates for a purpose-built network stack that optimizes every layer from silicon to software. Traditional data center switches are often burdened by bloated feature sets and complex pipelines designed for general-purpose routing, which increases power consumption and latency. By stripping away unnecessary protocols and focusing on AI-specific requirements like microsecond-level telemetry and adaptive load balancing to prevent hash collisions, the company aims to deliver a more efficient fabric. The speakers conclude that achieving a 100% success rate for collective communication is necessary to maximize tokens per watt, moving beyond the tuning of existing hardware toward a clean-sheet architecture designed for the next decade of AI scale.


AI Changes in the Norm with Upscale AI

Event: Networking Field Day 40

Appearance: Upscale AI Presents at Networking Field Day 40

Company: Upscale AI

Video Links:

Personnel: Deepti Chandra

Upscale AI, founded in 2025, recently emerged from stealth as a unicorn following $300 million in combined seed and Series A funding. With a team of industry veterans, Upscale AI is focused on building a clean sheet networking architecture specifically for the backend and lean front-end of AI clusters. The speakers emphasize that traditional data center networking is a round peg in a square hole for AI, as existing infrastructures were designed for general-purpose web traffic rather than the massive, synchronized communication required by billions of parameters and trillion-token models.

The presentation details the shift from a simple client-server model to a distributed ecosystem, where the network acts as the nervous system of an intelligent manufacturing plant or token factory. In this environment, the key performance indicators (KPIs) have shifted from bits per second to tokens per second and tokens per watt. As large language models (LLMs) outgrow the memory capacity of single GPUs, parallelism, such as slicing math problems across thousands of processors, becomes mandatory. This creates a massive data movement problem where any network synchronization stall or hot spot directly results in idle compute time and lost revenue, making predictability and low latency table stakes rather than optional features.

To address these challenges, Upscale AI is developing a portfolio that includes both scale-up and scale-out solutions built on open standards like Ethernet, SONiC, and the Ultra Ethernet Consortium (UEC). Their scale-out systems are designed around a partnership with NVIDIA’s Spectrum-X, while their scale-up innovation involves purpose-built silicon and trays to support heterogeneous compute environments. By focusing exclusively on AI traffic and removing the bloat of legacy enterprise features, Upscale AI aims to provide a reliable, predictive substrate that can proactively identify malfunctioning hardware. This architectural approach is intended to help operators maximize their token flywheel, ensuring that massive infrastructure investments yield the highest possible intelligence output per watt of power consumed.


Upscale AI Purpose-Built for AI Scale

Event: Networking Field Day 40

Appearance: Upscale AI Presents at Networking Field Day 40

Company: Upscale AI

Video Links:

Personnel: Aravind Srikumar

Upscale AI was founded in 2025 and quickly emerged from stealth to become a unicorn following $300 million in seed and Series A funding. The leadership team consists of industry veterans from major firms like Cisco, Broadcom, and NVIDIA, focusing on a clean sheet architecture designed to solve the specific demands of AI networking. Unlike traditional data center providers, Upscale AI is a pure-play firm targeting the back-end and lean front-end networks where collective communication and high-density traffic require more than general-purpose switching.

The technical strategy centers on addressing both scale-up and scale-out domains through a combination of proprietary silicon and open-source software. For scale-out networking, the company is partnering with NVIDIA to build systems around Spectrum-X, while simultaneously developing its own optimized silicon and trays to enable open scale-up architectures for various processing units. A key pillar of their philosophy is the enablement of heterogeneous compute, ensuring that their fabric can support a diverse landscape of ASICs, including GPUs, TPUs, and LPUs. By utilizing a unified software stack based on SONiC, they aim to provide a turnkey, customizable operating system that simplifies deployment for neoclouds, enterprises, and hyperscalers.

To ensure long-term viability and industry alignment, Upscale AI is heavily involved in standards bodies such as OCP, UEC (Ultra Ethernet Consortium), and UA Link. Their approach emphasizes predictability and reliability by stripping away the legacy features required for enterprise or service provider markets, focusing solely on AI-specific traffic patterns. Through active partnerships with GPU vendors and hyperscalers, the company is validating its designs in real-world environments to ensure interoperability. Ultimately, Upscale AI intends to move beyond the current homogeneous nature of AI clusters, providing an open, standards-based substrate that can scale and evolve alongside the next decade of AI innovation.


Arista CloudVision 360° Observability for AI

Event: Networking Field Day 40

Appearance: Arista Presents at Networking Field Day 40

Company: Arista

Video Links:

Personnel: Praful Bhaidasna

Monitoring and managing complex AI infrastructure requires moving beyond traditional networking tools that treat the environment as a black box. Praful Bhaidsana explains that the industry has long suffered from a mean time to truth problem where network operators are blamed for issues they cannot properly diagnose because they lack visibility into what is connected to the network. Arista aims to change this Stone Age approach by evolving from simple monitoring to 360-degree observability. This strategy is centered on CloudVision, a NetOps platform that utilizes a common network data lake called NetDL to aggregate high-fidelity streaming telemetry from every Arista device across the data center, campus, and WAN.

The architecture relies on the fact that Arista’s EOS provides consistent, reliable state data, ranging from MAC address tables and routing updates to microburst signals and configuration changes. This information is stored in a time-series database, allowing operators to travel back in time to compare network states before and after an incident. To manage the resulting deluge of data, Arista employs an AI/ML engine known as AVA, or Autonomous Virtual Assist. AVA identifies patterns and anomalies, filtering out the noise to show only the relevant signals. This allows human operators to focus on making informed decisions rather than spending hours manually correlating events across different silos.

Furthermore, CloudVision has opened its ecosystem to ingest data from third-party systems, AI job orchestrators, and compute and storage metrics via Prometheus. This integration is critical for AI environments where a job stall could be caused by anything from a GPU failure to a NIC issue. Arista has introduced a dedicated AI jobs dashboard that correlates specific training jobs with the underlying flows, servers, and switches. To simplify interactions with this massive dataset, a digital virtual assistant allows users to query their infrastructure using natural language. This integrated approach ensures that expensive GPU resources do not sit idle and that the resolution of complex performance bottlenecks can happen in minutes rather than days.


The Optics Evolution with Arista

Event: Networking Field Day 40

Appearance: Arista Presents at Networking Field Day 40

Company: Arista

Video Links:

Personnel: Vijay Vusirikala

Optics have been critical for network switches especially at 800 Gbps and evolving to 1.6T from cost, density, power efficiency and performance perspective. Arista has been driving major innovations in optics to power networking for AI applications. In this section, we will cover the optics landscape and highlight the evolution to high-density 1.6T optics for AI networking.

Arista identifies power, cost, and reliability as the three pillars of optical innovation. While the OSFP form factor has been incredibly successful, projected to reach 100 million modules in 2026, it is reaching its thermal and density limits at approximately 35 watts. To meet the demands of next-generation AI networks, Arista introduced XPO, or extra dense pluggable optics. This new form factor is designed for 1.6T and beyond, offering a 4x improvement in system-level density compared to OSFP. By utilizing a two-tier card design and liquid cooling, XPO can handle thermal loads of up to 400 watts. This shift to liquid cooling not only manages heat but also significantly improves reliability by keeping optical components at lower, more stable temperatures, effectively reducing failure rates by five to eight times.

Beyond density and cooling, the evolution of optics includes a push for power-efficient architectures like Linear Pluggable Optics (LPO), which can reduce power consumption by 60% by eliminating internal retiming. Arista is also exploring Co-Packaged Optics (CPO) as a complementary solution, though XPO currently offers broader versatility across the various reaches required for scale-up, scale-out, and scale-across fabrics. While CPO is compelling for high-density, short-reach DR optics, it faces challenges regarding universal reach and a maturing supply chain. XPO, conversely, supports the full spectrum of connectivity from short-reach copper and fiber to long-distance coherent optics like ZR and ZR+, which are essential for the IP-over-DWDM architectures used in scale-across regional networks.

Arista remains committed to an open ecosystem, having established XPO as an open multi-source agreement (MSA) with over 100 partners, including major module vendors and system competitors like Cisco and Nokia. This open approach ensures a robust supply chain and allows customers to avoid proprietary lock-in while transitioning to liquid-cooled, high-density environments. The transition to XPO also enables data center operators to shrink the footprint of their networking racks, freeing up more physical space and power for revenue-generating GPU compute. As speeds advance toward 1.6T and 3.2T, the combination of denser form factors, advanced liquid cooling, and open standards will be the foundation for sustaining the growth of generative AI infrastructure.


Arista Networking for AI: The Ethernet Backplane

Event: Networking Field Day 40

Appearance: Arista Presents at Networking Field Day 40

Company: Arista

Video Links:

Personnel: Tom Emmons

We go deep into the fabric of the AI cluster. We’ll discuss why Ethernet has become the definitive backplane for AI workloads. We’ll explore hardware innovations in power efficiency and the protocol optimizations–like Dynamic Load Balancing (DLB) and advanced congestion control–that keep data moving at the speed of thought. This section will cover different networks for AI networking from scale-up to scale-out and scale-across, and discuss optimizations and enhancements to Ethernet standards such as UEC and E-SUN for AI applications.

Tom Emmons emphasizes that as AI networks become business-critical, quality and power efficiency are the primary drivers of architectural decisions. Every problem in an AI network escalates immediately because of the massive financial investments involved, making a reliable network essential. Since power is the fundamental limiting factor for GPU density in a data center, Arista focuses on reducing the network power footprint, ideally to less than 10% of total facility power, through liquid cooling, low-power optics, and high-radix switches that minimize the number of tiers. By reducing tiers, operators save on optics, which are the largest contributors to network power consumption, while also simplifying load balancing and reducing potential congestion points.

The presentation identifies four distinct AI fabrics: front-end, scale-out, scale-across, and scale-up. While scale-out provides the essential east-west connectivity for GPU training, scale-across is becoming increasingly vital for customers who must link geographically dispersed buildings to overcome local power and space constraints. Scale-across networking leverages Arista’s extensive experience in WAN and routing, utilizing deep buffers, encryption, and traffic engineering to manage latency and protect data. Meanwhile, the front-end network mirrors traditional data center designs but demands higher reliability and security to manage the billions of dollars in hardware it connects to the world and local storage resources.

Arista is a vocal advocate for Ethernet as the universal backplane, specifically for the emerging scale-up market where GPU-to-GPU memory copies occur. Through leadership in consortiums like the Ultra Ethernet Consortium (UEC) and the Ethernet for Scale-up Networks (ESun) workgroup, Arista is refining Ethernet to handle 256-byte cache line transactions and packet spraying more efficiently. Emmons posits that the dominance of Ethernet is driven by the industry’s desire for multi-vendor ecosystems and a unified management model. By running a single EOS image across all four fabric types, Arista provides a mature, tested software stack that allows operators to use the same BGP stack and telemetry tools regardless of whether they are managing a local scale-up cluster or a global scale-across network.


Scaling the AI Network Frontier with Arista

Event: Networking Field Day 40

Appearance: Arista Presents at Networking Field Day 40

Company: Arista

Video Links:

Personnel: Brendan Gibbs

Brendan Gibbs and other leaders from Arista outlined the company’s comprehensive strategy for AI infrastructure. The presentation highlighted Arista’s Etherlink portfolio, which is optimized for 800G connectivity and anchored by the EOS operating system. Gibbs emphasized that a well-optimized network is no longer just plumbing but a critical component that can improve AI job completion times by 44% and maximize the utilization of expensive GPUs. The core value proposition centers on providing open, Ethernet-based standards that offer customers flexibility and choice without the proprietary lock-in often found in competitive solutions.

The technical framework of the discussion focused on the Four AI Fabrics that Arista supports: front-end, scale-out, scale-across, and scale-up. Front-end fabrics handle traditional data center workloads and inferencing, while scale-out focuses on back-end training for clusters reaching up to 100,000 GPUs. For larger deployments that exceed the physical limits of a single building, Arista’s scale-across technology utilizes deep buffering, encryption, and routing intelligence to link geographically dispersed data centers. Finally, the scale-up fabric represents the new frontier of Ethernet-based interconnects designed for memory coherency across XPU clusters.

Arista differentiates itself through a broad hardware portfolio and a commitment to industry-wide innovation. Gibbs detailed the concept of hierarchical hybrid buffering, which combines on-chip shallow buffers for low latency with on-package deep buffers to prevent packet loss during congestion. Beyond hardware, Arista remains a leader in defining open standards, having contributed significantly to the OSFP MSA, the Ultra Ethernet Consortium, and the OCP. By offering modular and fixed platforms that support various architectural dreams, Arista aims to provide high-scale, low-power networking that remains fully interoperable and vendor-neutral.


Aviz Network Copilot Demo with Cody McCain

Event: Networking Field Day 40

Appearance: Aviz Networks Presents at Networking Field Day 40

Company: Aviz Networks

Video Links:

Personnel: Cody McCain, Thomas Scheibe

Aviz Networks introduces the AI Networking Operations Center (NOC), a vendor-neutral, agentic AI platform designed to transform traditional network management. Unlike tools that merely append a Large Language Model (LLM) to an existing product, Aviz provides a private, secure, and interoperable framework that integrates with a customer’s existing ecosystem. The platform is built on the FITS principles, Freedom, Integration, Tailorability, and Security, ensuring that enterprises can leverage AI without compromising data privacy or being locked into a specific hardware vendor or software controller.

The presentation features a deep dive into the practical application of Aviz’s Network Copilot, emphasizing its role in accelerating root cause analysis (RCA) and achieving mean time to innocence. By integrating with tools like ServiceNow, the platform can automatically process inbound tickets, extract metadata like destination IPs, and query heterogeneous environments, including Arista, Cisco, and SONiC switches, without requiring the operator to know vendor-specific CLI commands. The system utilizes Agentic workflows where specialized agents, such as those for firewall log analysis or VLAN management, work hierarchically to solve complex issues. This approach allows frontline support staff to handle advanced troubleshooting tasks that previously required expert intervention, while maintaining a human-in-the-loop requirement for critical configuration changes.

Security and data integrity serve as the foundation of the Aviz architecture, with the platform offering the flexibility to run private, on-premise LLMs like the GPT-OSS 120B to prevent data leakage. The speakers highlight that the platform is designed with robust role-based access control (RBAC) and guardrails to prevent hallucinations or unauthorized actions, such as deleting global VLANs. Furthermore, the AI NOC addresses data hygiene by using AI to identify inconsistencies across disparate databases like IPAM and NetBox. By providing a Python-based SDK, Aviz enables organizations to build custom agents that mirror their specific business processes, ultimately moving network operations toward a more proactive, automated, and predictable future.


Aviz and the AI NOC

Event: Networking Field Day 40

Appearance: Aviz Networks Presents at Networking Field Day 40

Company: Aviz Networks

Video Links:

Personnel: Cody McCain, Thomas Scheibe

Aviz Networks introduces the AI Networking Operations Center (NOC), a vendor-neutral, agentic AI platform designed to transform traditional network management. Unlike tools that merely append a Large Language Model (LLM) to an existing product, Aviz provides a private, secure, and interoperable framework that integrates with a customer’s existing ecosystem. The platform is built on the FITS principles, Freedom, Integration, Tailorability, and Security, ensuring that enterprises can leverage AI without compromising data privacy or being locked into a specific hardware vendor or software controller.

Thomas Scheibe and Cody McCain emphasize that while the industry often markets AI as a simple natural language interface, the real challenge lies in the complex backend workflows of network operations. Aviz addresses this by acting as a Red Hat for networking, providing professional support and software for open-source options like SONiC while maintaining compatibility with legacy systems from Cisco, Arista, and NVIDIA. Their approach focuses on the Agentic power of AI, where modular agents can be customized to handle specific organizational workflows, moving beyond vendor-defined constraints to meet the unique operational needs of each customer.

To demonstrate the platform’s practical utility, the speakers showcase a physical lab environment featuring a heterogeneous mix of switches, firewalls from Fortinet and Palo Alto, and various management tools. By interfacing directly with devices and disparate data sources, such as config logs, flow data, and ticketing systems, the AI NOC streamlines root cause analysis and automates responses. This architecture allows companies to transition complex tasks typically reserved for expert engineers during maintenance windows into automated functions that can be managed by Tier 1 or Tier 2 support, significantly increasing operational efficiency.


From 400G BiDi to 1.6T: Cisco Optics for Al Fabrics with Paymon Mogharabi

Event: Networking Field Day 40

Appearance: Cisco Data Center Networking Presents at Networking Field Day 40

Company: Cisco

Video Links:

Personnel: Paymon Mogharabi

As AI training and inference scale, the network must function as an extension of the compute fabric. This session explores the architectural requirements for high-performance AI data centers. We will examine the shift toward deterministic networking to mitigate tail latency and fabric congestion, alongside critical hardware innovations — including advanced cooling and next-generation optics, designed to maximize performance and power efficiency. Attendees will gain technical insights into building a unified, programmable fabric that optimizes performance and scalability for high-density AI environments.

The presentation introduces the third generation of Cisco’s bidirectional (BiDi) technology, specifically the 400G BiDi optic. This innovation addresses fiber infrastructure constraints by enabling fiber reuse, allowing customers to upgrade from 40G or 100G to 400G over existing duplex multi-mode fiber without installing new trunk cables or patch panels. By utilizing four wavelengths at 100G each over a single fiber pair, the 400G BiDi simplifies the physical layer with LC connectors, making it eight times more fiber-efficient than parallel SR8 solutions. This approach offers significant financial and operational benefits for both brownfield and greenfield deployments by reducing installation costs and troubleshooting complexity.

A major portion of the session focuses on the critical role of optics reliability and Cisco’s advanced silicon photonics in AI environments. Unlike traditional networks where retransmissions are common, AI workloads are highly synchronized; a single unreliable optical link can cause GPU clusters to stall, potentially reducing performance by 40%. Cisco’s silicon photonics architecture integrates electronics and photonics into a single system, improving stability and power efficiency for 800G and 1.6T scales. Notable highlights include the 1.6T pluggable optic, which supports flexible breakout options, and the 800G Linear Pluggable Optic (LPO). By removing the DSP from the optic and shifting signal conditioning to the switch ASIC, the LPO solution reduces power consumption by 50% per module and lowers overall system latency, providing a more reliable and sustainable foundation for large-scale AI factories.


Cisco Silicon One Powered N9000 Switches with Faraz Taifehesmatian

Event: Networking Field Day 40

Appearance: Cisco Data Center Networking Presents at Networking Field Day 40

Company: Cisco

Video Links:

Personnel: Faraz Taifehesmatian

As AI training and inference scale, the network must function as an extension of the compute fabric. This session explores the architectural requirements for high-performance AI data centers. We will examine the shift toward deterministic networking to mitigate tail latency and fabric congestion, alongside critical hardware innovations — including advanced cooling and next-generation optics, designed to maximize performance and power efficiency. Attendees will gain technical insights into building a unified, programmable fabric that optimizes performance and scalability for high-density AI environments.

The presentation details Cisco’s strategic use of its Silicon One architecture, specifically the G-Series for AI scale-out and the P-Series for “scale-across” data center interconnects. The G-Series, highlighted by the G200 and G300 ASICs, provides high-radix connectivity with up to 512 ports of 200G and fully shared packet buffers to eliminate the performance bottlenecks found in traditional slice-based architectures. A core focus is the Cisco Intelligence Packet Flow (IPF), which enables advanced load balancing techniques such as packet spraying and flowlet switching. These features allow Ethernet to mimic the lossless properties of InfiniBand, ensuring high job completion times for RDMA-heavy AI workloads while maintaining a programmable pipeline that can adapt to evolving standards like Ultra Ethernet mid-cycle.

Hardware innovation is further demonstrated through new form factors and cooling solutions designed for high-density AI environments. Cisco introduced liquid-cooled chassis, such as the N9364F-SG3-L, which achieves 100% liquid cooling to handle the massive power requirements of 100-terabit ASICs without the need for fans. These systems support next-generation optics, including Linear Pluggable Optics (LPO) that reduce power consumption by half and coherent ZR/ZR+ optics for long-haul connectivity up to 1,000 km. Additionally, Cisco’s partnership with NVIDIA was underscored through the N9100 series, which integrates NVIDIA Spectrum-4 and Spectrum-6 silicon into the Cisco ecosystem. This gives customers the choice between a vertically integrated Cisco fabric or an end-to-end NVIDIA Spectrum-X solution, all managed through a consistent operating system and the Nexus Dashboard.


Cisco Scaling AI – Deterministic Fabrics and High-Density Infrastructure with Richard Licon

Event: Networking Field Day 40

Appearance: Cisco Data Center Networking Presents at Networking Field Day 40

Company: Cisco

Video Links:

Personnel: Faraz Taifehesmatian, Richard Licon

As AI training and inference scale, the network must function as an extension of the compute fabric. This session explores the architectural requirements for high-performance AI data centers. We will examine the shift toward deterministic networking to mitigate tail latency and fabric congestion, alongside critical hardware innovations — including advanced cooling and next-generation optics, designed to maximize performance and power efficiency. Attendees will gain technical insights into building a unified, programmable fabric that optimizes performance and scalability for high-density AI environments.

The presentation emphasizes that an AI-ready data center requires simultaneous innovation across five key dimensions: scalability, power efficiency, security, operational management, and silicon diversity. Cisco highlights the rapid transition in networking speeds, moving from 400G and 800G to 1.6T in just two years to keep pace with GPU evolution. A major focus is placed on the shift toward Ethernet for scale-out fabrics, as it offers a consistent operational model across front-end, back-end, and management networks. To achieve performance parity with InfiniBand, Cisco utilizes its Silicon One architecture, featuring deep, fully shared packet buffers and programmable pipelines that allow for the mid-cycle introduction of advanced features like dynamic load balancing and packet spraying to mitigate microbursts and reduce job completion time.

Cisco also detailed its strategic partnership with NVIDIA, which goes beyond simple reselling to include co-engineering systems that integrate Cisco’s NXOS and Nexus Dashboard with NVIDIA’s Spectrum-4 silicon. This collaboration aims to provide repeatable, standardized reference architectures that support high-performance features like adaptive routing and direct data placement. Furthermore, the discussion introduced the concept of “scaling across” geographically distant data centers, necessitating P-series silicon with deeper buffers and advanced optics for long-haul connectivity. By offering a vertically integrated stack encompassing silicon, hardware, operating systems, and optics, Cisco aims to provide a cohesive and programmable fabric that addresses the extreme power and performance demands of modern agentic AI workloads.


Sign up for updates to
Tech Field day events

Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.

We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.

Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!