|
![]() Henry Wu, Robin Grindley, and Pete Del Vecchio, presented for HPE at AI Infrastructure Field Day 3 |
This Presentation date is September 10, 2025 at 8:00-9:30.
Presenters: Henry Wu, Pete Del Vecchio, Robin Grindley
Broadcom Tomahawk 6 Scaling AI Networks with the World’s First 102.4 Tbps Ethernet Switch
Watch on YouTube
Watch on Vimeo
Tomahawk 6 is the world’s first 102.4 Tbps Ethernet switch, designed to meet the demands of massive AI infrastructure. In this session, we’ll show how it enables both scale-up and scale-out networking, with unmatched bandwidth, energy efficiency, and congestion control. We’ll also debunk common myths about AI networking and explain why Ethernet has become the fabric of choice for the world’s largest GPU clusters.
Pete DelVecchio introduced Broadcom’s Tomahawk 6, a 102.4 Tbps Ethernet switch designed for both scale-up and scale-out AI networking. Tomahawk 6 doubles the bandwidth and SERDES speed of its predecessor, Tomahawk 5, and incorporates features to enhance load balancing and congestion control. DelVecchio emphasized that Ethernet has become the dominant choice for AI scale-out networks and is gaining traction in scale-up environments due to its open ecosystem and the ability to partition large clusters for different customers.
Tomahawk 6 comes in two versions: one with 512 lanes of 200G SERDES and another with 1,024 lanes of 100G SERDES. The presenter highlighted that Tomahawk 6 is built on a multi-die implementation, with a central core for packet processing and chiplets for I/O. The chip is designed for both scale up and scale out applications. For scale-up applications, Tomahawk 6 can support 512 XPUs in a single-hop network.
The presentation also touched upon power efficiency, emphasizing that Tomahawk 6 enables two-tier network designs, which significantly reduce the number of optics required compared to three-tier networks, leading to lower power consumption and reduced latency. DelVecchio also discussed advanced features like cognitive routing with global load balancing, telemetry, and diagnostics for proactive link management. Broadcom emphasized that Tomahawk 6 is an open, interoperable solution that works with any endpoint and offers flexibility in telemetry, load balancing, and congestion control.
Personnel: Pete Del Vecchio
Broadcom Tomahawk Ultra Low latency High performance and Reliable Ethernet for HPC and AI
Watch on YouTube
Watch on Vimeo
Tomahawk Ultra shatters the myths about Ethernet’s ability to address high-performance networking. In this session we will show how we added features for lossless networking, reduced latency and increased performance – all while maintaining compatibility with Ethernet.
The presentation introduces the Broadcom Tomahawk Ultra, a 51.2 terabit per second switch chip designed to bring high-performance Ethernet to markets traditionally dominated by InfiniBand, specifically HPC and AI. Addressing perceived limitations of Ethernet such as high latency, small frame size constraints, packet overhead, and lossy nature, the Tomahawk Ultra is a clean-slate design focused on ultra-low latency, high packet rates, and reliability. The chip is pin-compatible with Tomahawk 5, enabling quick adoption by OEMs and ODMs, and it’s currently shipping to partners who are building boxes with it.
Key features of the Tomahawk Ultra include a 250 nanosecond ball-to-ball (first bit in to first bit out) latency, high packet-per-second processing optimized for small message sizes common in HPC and AI inferencing, and support for in-network collectives (INC) to offload computation from XPUs during AI training. The chip also incorporates an optimized header format to reduce packet overhead in managed networks and advanced reliability features like link-layer retry (LLR) and credit-based flow control (CBFC) for lossless networking. Topology aware routing, enabling optimized packet paths in complex HPC networks, is also implemented.
The speaker emphasized that the Tomahawk Ultra aims to provide an open and standards-based approach to high-performance networking, adhering to Ethernet standards for compatibility and ease of management. It utilizes standard Ethernet tools for configuration and monitoring, with features like LLR automatically negotiating between the switch and endpoints. Broadcom has contributed the Scale Up Ethernet (SUE) specification to OCP to encourage an open ecosystem. The Tomahawk Ultra is positioned as an end-to-end solution for high performance, offering an alternative to technologies like NVLink in scale-up architectures while ensuring compatibility and openness.
Personnel: Robin Grindley
Jericho4: Enabling Distributed AI Computing Across Data Centers with Broadcom
Watch on YouTube
Watch on Vimeo
Jericho4 – Ethernet Fabric Router is a purpose-built platform for the next generation of distributed AI infrastructure. In this session, we will examine how Jericho4 pushes beyond traditional scaling limits, delivering unmatched bandwidth, integrated security, and true lossless performance—while interconnecting more than one million XPUs across multiple data centers.
The presentation discusses the Jericho 4 solution for scaling AI infrastructure across data centers. Current limitations in power and space capacity necessitate interconnecting smaller data centers via high-speed networks. Jericho 4 addresses the growing challenges of load balancing, congestion control, traffic management, and security at scale by offering four key features. First, it allows building a single system with 36K ports, acting as a single, non-blocking routing domain. Second, it provides high bandwidth with hyper ports (3.2T), a native solution for the large data flows characteristic of AI workloads. Third, its embedded deep buffer supports lossless RDMA interconnections over distances exceeding 100 kilometers. Finally, Jericho 4 has embedded security engines to enable security without impacting performance.
The Jericho 4 family offers various derivatives to suit different deployment scenarios, including modular and centralized systems. The architecture supports scaling as a single system through various form factors, from compact boxes to disaggregated chassis, and further scaling across a fabric. Hyper ports improve link utilization by avoiding hashing and collisions, leading to reduced training times. The deep buffer handles the bursty nature of AI workloads, minimizing congestion and ensuring lossless data transmission even over long distances. The embedded security engine addresses security concerns by enabling point-to-point MACsec and end-to-end IPsec with no performance impact.
Personnel: Henry Wu