|
|
This video is part of the appearance, “Cisco Presents at Networking Field Day 39“. It was recorded as part of Networking Field Day 39 at 10:30-12:00 on November 6, 2025.
Watch on YouTube
Watch on Vimeo
Paresh Gupta concluded the deep dive by focusing on the most complex challenge in AI networking: congestion and load balancing in the backend GPU-to-GPU fabric. He explained that while operational simplicity and cabling are critical, the primary performance bottleneck, even in non-oversubscribed networks, is the failure of traditional ECMP load balancing. Because ECMP hashes flows randomly, it creates severe traffic imbalances, where one link may be congested at 105% capacity while another sits idle at 60%. This non-uniform utilization, not a lack of total capacity, creates congestion, triggers performance-killing pause frames, and can lead to out-of-order packets, which are devastating for tightly coupled collective communication jobs.
To solve this, Cisco has developed advanced load-balancing techniques, moving beyond simple ECMP. Gupta detailed a “flowlet” dynamic load balancing (DLB) approach, where the switch detects inter-packet gaps to identify a flowlet and sends the next flowlet on the current, least-congested link. More importantly, he highlighted a fully validated, joint-reference architecture codesigned with NVIDIA. This solution combines Cisco’s per-packet DLB, running on its switches, with NVIDIA’s adaptive routing and direct data placement capabilities on the SuperNIC. This handshake between the switch and the NIC is auto-negotiated, and Gupta showed video benchmarks of a 64-GPU cluster where this method improved application-level bus bandwidth by 35-40% and virtually eliminated pause frames compared to standard ECMP.
This advanced capability, Gupta explained, is made possible by the P4-programmable architecture of Cisco’s Silicon One ASIC, which allows new features to be delivered without a multi-year hardware respin. He framed this as the foundational work that is now being standardized by the Ultra Ethernet Consortium (UEC), of which Cisco is a steering member. By productizing these next-generation transport features today, Cisco is able to provide a consistent, high-performance, and validated networking experience for any AI environment, offering enterprises a turnkey solution that rivals the performance of complex, custom-built hyperscaler networks.
Personnel: Paresh Gupta









