https://www.youtube.com/watch?v=WfRgRMXr6m0&w=640&rel=0&showinfo=0
Watch on YouTube
Watch on Vimeo
Upscale AI distinguishes between two critical domains: scale-up networking, which creates a large compute environment within a rack where multiple GPUs see a flat, unified memory, and scale-out networking, which connects these domains through memory copy operations. The presentation highlights that the network has become the backplane of a distributed ecosystem, moving from a standard client-server model to a highly synchronized all-to-all communication pattern. Upscale AI aims to solve the challenges of this new era by providing purpose-built hardware and software that prioritizes predictable, ultra-low latency and zero-oversubscription bandwidth to prevent computational stalls.
In the scale-up domain, the architecture must support load-store operations with latencies under one microsecond. Aravind Srikumar, introduces the Skyhammer architecture, a clean-sheet design specifically built for the scale-up environment that emphasizes “performance, performance, performance.” Unlike traditional networking, these systems utilize lightweight, optimized headers and offload congestion handling, such as Link Layer Retry (LLR) and Priority Flow Control (PFC), directly to the switch to minimize jitter. For scale-out needs, Upscale AI has partnered with NVIDIA to utilize the Spectrum-X substrate, building open, Ethernet-based systems around it that feature AI-optimized operating systems, hitless upgrades, and specialized circuitry for real-time power management and telemetry.
The company’s overarching vision is to enable a future of heterogeneous compute where customers can mix and match various processing units, such as GPUs, LPUs, and DPUs, without being locked into a single proprietary ecosystem. By utilizing open standards like SONiC, ESON, and UA Link, Upscale AI ensures that its fabric remains technology-agnostic and interoperable. This approach is designed to protect customer investments over a five-to-seven-year lifecycle, allowing the network to adapt as new AI workloads and specialized chips emerge. Ultimately, the goal is to transform the data center into an efficient token factory where every ounce of power and compute is maximized through an architected, rather than merely tuned, networking stack.
Personnel: Aravind Srikumar
Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.
We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.
Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!