|
![]() Ravi Kumar and Thomas Scheibe presented for Aviz Networks at AI Infrastructure Field Day 2 |
Design, deploy, and monitor networks for AI with Aviz
Watch on YouTube
Watch on Vimeo
Thomas Scheibe, Chief Product Officer, offers solutions for designing, deploying, and monitoring networks for AI workloads. Their focus is on addressing the specialized networking needs of AI, including multiple networks, differentiated Quality of Service (QoS), and the integration of compute into the end-to-end network topology. They aim to provide automation and orchestration for faster deployment, service activation, and infrastructure expansion. Their product, ONCE, supports Sonic and Cumulus network operating systems, focusing on streamlining network management through design, modeling, deployment, and monitoring capabilities.
The Aviz presentation highlighted the evolution of networking in AI, emphasizing the shift from a single data center network to multiple networks, particularly the separation between front-end (user access) and back-end (GPU communication) networks. Aviz recognizes the importance of lossless behavior, different methods to address AI application requirements, and the integration of network settings on both the switches and the network interface cards (NICs). The company partners with hardware providers and uses reference architectures like NVIDIA Spectrum-X to automate network configuration. This allows enterprises to define networks and configure network separation.
Aviz offers comprehensive support for Sonic deployments in enterprise data centers and at the edge. They are automating deployment workflows for the NVIDIA Spectrum-X reference architecture, with the ability to configure multi-tenancy and extend the fabric. Aviz simplifies network management in AI, allowing users to deploy and manage their networks quickly and efficiently. They offer a comprehensive suite of solutions to design, deploy, and monitor networks for AI, focusing on automation and orchestration.
Personnel: Thomas Scheibe
Demonstration of Day 2 AI network operations, monitoring and anomaly detection with Aviz
Watch on YouTube
Watch on Vimeo
Aviz Networks’ AI Infrastructure Field Day demonstration focused on Day 2 operations, monitoring, and anomaly detection for AI workloads. The core challenge addressed is the specialized networking requirements of AI, including multiple networks, differentiated QoS, and the need to manage compute as part of the end-to-end network topology. Aviz presented solutions for orchestrating AI fabrics based on Sonic and NVIDIA’s Spectrum-X reference architecture, showcasing a customer workflow that includes network design, Day 0 infrastructure deployment, Day 1 tenant onboarding and traffic isolation, and Day 2 operations like adding Pods, handling alerts, and troubleshooting.
The presentation demonstrated Aviz’s orchestration capabilities for Sonic-based and NVIDIA RA-based AI fabrics. For Sonic, the presenter showed how to orchestrate the fabric using YAML-based intent, validating configurations, and performing operational checks. The demonstration emphasized the ease of use of industry-standard CLI, built-in validation, and the ability to compare configurations to identify any drift. With the NVIDIA Spectrum-X platform, the presentation highlighted agentless orchestration, the use of NVIDIA AIR for simulating deployments, and config comparison.
Finally, the presentation detailed Aviz’s monitoring and anomaly detection features. The tool provides comprehensive monitoring with a bottom-up approach for networks, servers, and GPUs. The demo showed how to view various telemetry data, including traffic, queue drops, and GPU health metrics. The presentation also covered Aviz’s built-in anomaly detection system, which allows users to create custom rules and receive notifications through tools like Slack and Zendesk. The system includes curated rules, role-based access control, and configuration comparison capabilities to streamline operations and reduce potential errors.
Personnel: Ravi Kumar