|
This video is part of the appearance, “Keysight Presents at AI Infrastructure Field Day 2“. It was recorded as part of AI Infrastructure Field Day 2 at 10:30 - 12:00 on April 25, 2025.
Watch on YouTube
Watch on Vimeo
This session provides an overview of the Keysight AI fabric test methodology, demonstrating key findings and improvements achieved through automated testing and the search for optimal configuration parameters. Alex Bortek, Lead Product Manager at Keysight Technologies, introduces the Keysight AI fabric test methodology using the Kai Data Center Builder product. The methodology guides users through the phases of designing and building an AI fabric, emphasizing the importance of topology selection, collective operation algorithms, performance isolation, load balancing, and congestion control. The methodology and related white papers are available for download via a QR code or the link below
The presentation delves into key terminology, including collective operations (broadcast, all-reduce, all-to-all), ranks, collective size, and data size. Metrics such as collective completion time, algorithm bandwidth, and bus bandwidth are defined and used to measure performance. Alex explains how bus bandwidth is a beneficial metric as it removes the number of GPUs from the equation and specifies the limiting factor that defines how long the collective operation will take. A testbed comprising four 800-Gbps port speed switches is described, emulating 16 GPUs/network cards running at 400 Gbps to assess fabric performance.
A demonstration highlights the impact of congestion control on network performance. By comparing scenarios with and without congestion control enabled, the presentation illustrates how fine-tuning DCQCN parameters can optimize bandwidth utilization and reduce congestion. The speaker uses the tool to showcase testing of different settings on the fabric to achieve the optimal configuration. The presentation concludes by mentioning Ultra Ethernet consortium membership and upcoming webinars detailing Keysight’s innovations in AI.
Personnel: Alex Bortok