|
This video is part of the appearance, “Hammerspace presents at AI Infrastructure Field Day 3“. It was recorded as part of AI Infrastructure Field Day 3 at 10:30-12:30 on September 10, 2025.
Watch on YouTube
Watch on Vimeo
The highest performing storage available today is an untapped resource within your server clusters that can be activated by Hammerspace to accelerate AI workloads and increase GPU utilization. This session covers how Hammerspace unifies local NVMe across server clusters as a protected, ultra-fast tier that is part of a unified global namespace. This underutilized capacity can now accelerate AI workloads as shared storage, with data automatically orchestrated by Hammerspace across other tiers and cloud storage to increase time to token while also reducing infrastructure costs.
Floyd Christopherson from Hammerspace introduces Tier 0, focusing on how it accelerates AI workflows in GPU and CPU-based clusters. The core problem addressed is the stranded capacity of local NVMe storage within servers, which, despite its speed, is often underutilized. Accessing data over the network to external storage becomes a bottleneck, especially in AI workflows with growing context lengths and fast token access requirements. While increasing network capacity is an option, it’s expensive and still limited. Tier 0 aggregates this local capacity into a single storage tier, making it the primary storage for workflows and enabling programmatic data orchestration, effectively unlocking petabytes of previously unused storage and eliminating the need to buy additional expensive Tier 1 storage.
Hammerspace’s Tier 0 leverages standards-based environments, with the client-side using standard NFS, SMB, and S3 protocols, eliminating the need for client-side software installations. The technology utilizes parallel NFS v4.2 with flex files, contributed to the Linux kernel, to enhance performance and efficiency. This approach avoids proprietary clients and special server deployments, allowing the system to work with existing infrastructure. The orchestration and unification of capacity across servers are key to the solution, turning compute nodes into storage servers without creating isolated islands, thereby reducing bottlenecks and improving data access speeds.
The presentation highlights the performance benefits of Tier 0, showcasing theoretical results and MLPerf benchmarks that demonstrate superior performance per rack unit. By utilizing local NVMe storage, Hammerspace reduces the reliance on expensive and slower cloud storage networks, leading to greater GPU utilization. Furthermore, Hammerspace contributes enhancements to the Linux kernel, such as local IO, to reduce CPU utilization and accelerate write performance, solidifying its commitment to standard-based solutions and continuous improvement in data accessibility. The architecture is designed to be non-disruptive, allowing for live data mobility behind the scenes, ensuring seamless user experience.
Personnel: Floyd Christofferson