|
This video is part of the appearance, “Aviz Networks Presents at AI Infrastructure Field Day 2“. It was recorded as part of AI Infrastructure Field Day 2 at 8:00 - 9:00 on April 25, 2025.
Watch on YouTube
Watch on Vimeo
Aviz Networks’ AI Infrastructure Field Day demonstration focused on Day 2 operations, monitoring, and anomaly detection for AI workloads. The core challenge addressed is the specialized networking requirements of AI, including multiple networks, differentiated QoS, and the need to manage compute as part of the end-to-end network topology. Aviz presented solutions for orchestrating AI fabrics based on Sonic and NVIDIA’s Spectrum-X reference architecture, showcasing a customer workflow that includes network design, Day 0 infrastructure deployment, Day 1 tenant onboarding and traffic isolation, and Day 2 operations like adding Pods, handling alerts, and troubleshooting.
The presentation demonstrated Aviz’s orchestration capabilities for Sonic-based and NVIDIA RA-based AI fabrics. For Sonic, the presenter showed how to orchestrate the fabric using YAML-based intent, validating configurations, and performing operational checks. The demonstration emphasized the ease of use of industry-standard CLI, built-in validation, and the ability to compare configurations to identify any drift. With the NVIDIA Spectrum-X platform, the presentation highlighted agentless orchestration, the use of NVIDIA AIR for simulating deployments, and config comparison.
Finally, the presentation detailed Aviz’s monitoring and anomaly detection features. The tool provides comprehensive monitoring with a bottom-up approach for networks, servers, and GPUs. The demo showed how to view various telemetry data, including traffic, queue drops, and GPU health metrics. The presentation also covered Aviz’s built-in anomaly detection system, which allows users to create custom rules and receive notifications through tools like Slack and Zendesk. The system includes curated rules, role-based access control, and configuration comparison capabilities to streamline operations and reduce potential errors.
Personnel: Ravi Kumar