Watch on YouTube
Watch on Vimeo
Keysight’s AI Data Center Test Platform is designed to emulate AI workloads, enabling users to benchmark and validate the performance of AI infrastructure in both pre-deployment labs and production AI clusters. The platform allows AI operators and equipment vendors to enhance the efficiency of AI model training over Ethernet networks by experimenting with various workload parameters and network designs. Notably, the platform provides comprehensive insights into the performance of communications and RDMA transports without the need for GPUs, making it a cost-effective solution for testing and optimization.
During the presentation, Alex Bortok and Ankur Sheth discussed the critical role of network performance in AI training, emphasizing that a significant portion of GPU time is spent on data communication rather than computation. They highlighted the importance of co-tuning the software stack and network components to achieve optimal performance, particularly as AI workloads grow in complexity and size. The speakers also explained the challenges associated with traditional benchmarking methods, which often fail to correlate performance metrics across different components of the AI infrastructure. The AI Data Center Test Platform addresses these challenges by providing a controlled environment for emulating workloads and generating real traffic, allowing for more accurate performance assessments.
The architecture of the platform is built on Keysight’s Aries 1 series of traffic generators, which can produce Rocky traffic at line rates. The platform’s software stack is API-driven, enabling users to conduct collective benchmarks and analyze results effectively. The presenters outlined the various testing capabilities offered by the platform, including load balancing, congestion control, and topology experimentation, all aimed at reducing the time required for AI model training. By providing deeper insights and repeatable testing conditions, Keysight’s AI Data Center Test Platform positions itself as a valuable tool for optimizing AI infrastructure and accelerating the deployment of AI models.
Personnel: Alex Bortok, Ankur Sheth
Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.
We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.
Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!