|
This video is part of the appearance, “Broadcom Presents at Networking Field Day 32“. It was recorded as part of Networking Field Day 32 at 8:00-10:00 on July 26, 2023.
Watch on YouTube
Watch on Vimeo
In this discussion, Mohan Kalkuntay, VP of Architecture and Technology at Broadcom, highlights the significance of Ethernet fabric and introduces Broadcom’s solutions tailored for AI applications. AI applications are characterized by their complex requirements, including large models with billions of parameters. GPUs play a vital role in AI processing, and networking is essential for interconnecting and coordinating large GPU clusters. The compute, communication, and synchronization phases are crucial in AI training, where large neural networks are trained and gradients and parameters are exchanged. Networking in AI is unique, with fewer flows, high bandwidth, synchronization, bursty traffic, and potential challenges like flow collisions and link failures. Tail latency greatly impacts the performance of AI training, and minimizing it leads to faster job completion. To improve AI networking, techniques like network telemetry, packet spraying, load-aware ECMP, zero impact failure, and credit control mechanisms are employed.
Personnel: Mohan Kalkunte