|
This Presentation date is February 23, 2024 at 8:00-9:00.
Presenters: Chad Smith, Floyd Christofferson
Accelerating AI Pipelines with Hammerspace
Watch on YouTube
Watch on Vimeo
In this session, Floyd Christofferson and Chad Smith from Hammerspace will look at solutions to achieve HPC-class performance to feed GPU-based AI pipelines while leveraging data in place on existing storage resources. This session will give real-world examples of how customers have adapted their existing infrastructure to accommodate the performance levels needed for AI and other high-performance workflows.
Christofferson and Smith discuss how Hammerspace can accelerate AI pipelines by addressing the challenges of managing and accessing unstructured data across various storage systems and locations. They introduce the concept of a global data environment that leverages a parallel global file system, allowing data to remain in place while providing high-performance access necessary for AI workloads. They begin by explaining the silo problem in AI pipelines, where unstructured data is spread across multiple storage types and locations, making it difficult to aggregate without moving it to a new repository. Hammerspace’s solution allows for the assimilation of file system metadata from existing storage, enabling a global view and access to data without physically moving it. This approach prevents copy sprawl, maintains data governance, and avoids additional capital and operational expenses.
The session highlights the introduction of a new product, Hammerspace Hyperscale NAS, which provides HPC-class parallel file system performance using standard protocols and networking, without requiring proprietary clients or altering existing infrastructure. This solution is said to be storage agnostic and can accelerate existing third-party storage, making it suitable for enterprises looking to incorporate AI workflows without significant upfront investment. The duo provides real-world examples, including a hyperscaler with a large AI training and inferencing environment, where Hammerspace’s technology enabled scalability without altering the existing infrastructure. Another example is a visual effects customer who achieved the required performance for rendering without changing their storage infrastructure.
Personnel: Chad Smith, Floyd Christofferson
Taming Unstructured Data Orchestration with Hammerspace
Watch on YouTube
Watch on Vimeo
In this session, Floyd Christofferson and Chad Smith from Hammerspace will step through the key capabilities of Hammerspace Global Data Environment software, and how it automates unstructured data orchestration across multi-vendor, multi-site, and often multi-cloud storage environments. It will focus specifically on solutions to the problems of when data that is needed for AI pipelines is distributed across silos, sites, and clouds.
Christofferson and Smith discuss the capabilities of Hammerspace’s Global Data Environment software for automating unstructured data orchestration across various storage environments, including multi-vendor, multi-site, and multi-cloud infrastructures. They focus on how this can be particularly beneficial for AI workflows, where data is often distributed across different locations and silos.
Hammerspace’s solution involves separating file system metadata from the actual data, elevating it above the infrastructure layer into a global metadata control plane. This allows for a common view of files across different storage systems and locations, enabling transparent and automated data orchestration without disrupting user access or requiring data movement.
The software is Linux-based and includes two components: Anvil servers for metadata control and DSX nodes for I/O handling. It supports multi-protocol access, including NFS, parallel NFS, and S3, and allows for the setting of objective-based policies for data management, including protection, tiering, and geographical considerations.
Hammerspace can be installed on various platforms, including bare metal, cloud instances, and VMs, and it facilitates seamless integration of on-premises storage with cloud resources. This enables use cases like bursting AI workloads to the cloud, managing data across global sites, and optimizing compute resource costs by automating data movement to the most cost-effective locations.
Floyd provides examples of Hammerspace’s application in different industries, such as online gaming, rocket company Blue Origin, and a data center in London that saves costs by orchestrating render jobs to cheaper cloud regions.
Personnel: Chad Smith, Floyd Christofferson