Watch on YouTube
Watch on Vimeo
In this discussion, Mohan Kalkuntay, VP of Architecture and Technology at Broadcom, highlights the significance of Ethernet fabric and introduces Broadcom’s solutions tailored for AI applications. AI applications are characterized by their complex requirements, including large models with billions of parameters. GPUs play a vital role in AI processing, and networking is essential for interconnecting and coordinating large GPU clusters. The compute, communication, and synchronization phases are crucial in AI training, where large neural networks are trained and gradients and parameters are exchanged. Networking in AI is unique, with fewer flows, high bandwidth, synchronization, bursty traffic, and potential challenges like flow collisions and link failures. Tail latency greatly impacts the performance of AI training, and minimizing it leads to faster job completion. To improve AI networking, techniques like network telemetry, packet spraying, load-aware ECMP, zero impact failure, and credit control mechanisms are employed.
Personnel: Mohan Kalkunte
Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.
We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.
Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!