|
This video is part of the appearance, “Arista Presents at AI Field Day 5“. It was recorded as part of AI Field Day 5 at 9:30-11:00 on September 13, 2024.
Watch on YouTube
Watch on Vimeo
In the presentation at AI Field Day 5, Tom Emmons, the Software Engineering Lead for AI Networking at Arista Networks, discussed the challenges and solutions related to AI networking visibility. Traditional network monitoring strategies, which rely on interface counters and packet drops, are insufficient for AI networks due to the high-speed interactions that occur at microsecond and millisecond intervals. To address this, Arista has developed advanced telemetry tools to provide more granular insights into network performance. One such tool is the AI Analyzer, which captures traffic statistics at 100-microsecond intervals, allowing for a detailed view of network behavior that traditional second-scale counters miss. This tool helps identify issues like congestion and load balancing inefficiencies by providing a microsecond-level perspective on network traffic.
Emmons also introduced the AI Agent, an extension of Arista’s EOS (Extensible Operating System) to the NIC (Network Interface Card) servers. This feature allows for centralized management and monitoring of both the Top of Rack (TOR) switches and the NIC connections. The AI Agent facilitates auto-discovery and configuration synchronization between the switch and the NIC, ensuring consistent network settings across the entire infrastructure. This centralized approach helps prevent common issues such as mismatched configurations between network devices and servers, which can lead to suboptimal performance. The AI Agent’s ability to integrate with various NICs through specific plugins further enhances its versatility and applicability in diverse network environments.
Additionally, the AI Agent’s integration with Arista’s CloudVision software provides a unified management view that includes both network and server statistics. This comprehensive visibility enables network engineers to correlate network events with server-side issues, significantly improving the efficiency of network troubleshooting. By incorporating AI and machine learning techniques, Arista aims to identify real anomalies and correlate them with network events, thereby distinguishing between genuine issues and noise. This holistic approach to network visibility and debugging ensures that engineers can quickly and accurately diagnose and resolve performance problems, ultimately leading to more reliable and efficient AI network operations.
Personnel: Tom Emmons