|
This video is part of the appearance, “Keysight Presents at AI Field Day 5“. It was recorded as part of AI Field Day 5 at 8:00-9:30 on September 11, 2024.
Watch on YouTube
Watch on Vimeo
Keysight’s AI Data Center Test Platform is designed to emulate AI workloads, enabling users to benchmark and validate the performance of AI infrastructure in both pre-deployment labs and production AI clusters. The platform allows AI operators and equipment vendors to enhance the efficiency of AI model training over Ethernet networks by experimenting with various workload parameters and network designs. Notably, the platform provides comprehensive insights into the performance of communications and RDMA transports without the need for GPUs, making it a cost-effective solution for testing and optimization.
During the presentation, Alex Bortok and Ankur Sheth discussed the critical role of network performance in AI training, emphasizing that a significant portion of GPU time is spent on data communication rather than computation. They highlighted the importance of co-tuning the software stack and network components to achieve optimal performance, particularly as AI workloads grow in complexity and size. The speakers also explained the challenges associated with traditional benchmarking methods, which often fail to correlate performance metrics across different components of the AI infrastructure. The AI Data Center Test Platform addresses these challenges by providing a controlled environment for emulating workloads and generating real traffic, allowing for more accurate performance assessments.
The architecture of the platform is built on Keysight’s Aries 1 series of traffic generators, which can produce Rocky traffic at line rates. The platform’s software stack is API-driven, enabling users to conduct collective benchmarks and analyze results effectively. The presenters outlined the various testing capabilities offered by the platform, including load balancing, congestion control, and topology experimentation, all aimed at reducing the time required for AI model training. By providing deeper insights and repeatable testing conditions, Keysight’s AI Data Center Test Platform positions itself as a valuable tool for optimizing AI infrastructure and accelerating the deployment of AI models.
Personnel: Alex Bortok, Ankur Sheth