Watch on YouTube
Watch on Vimeo
Marco Abela, Product Manager at Google Cloud Storage, presented an overview of Google Cloud’s storage solutions optimized for AI/ML workloads. The presentation addressed the critical role of storage in AI pipelines, emphasizing that an inadequate storage solution can significantly bottleneck GPU utilization, causing idle GPUs and hindering data processing from initial data preparation to model serving. He highlighted two industry-optimized storage types: object storage (Cloud Storage) for persistent, high-throughput storage with virtually unlimited capacity, and parallel file systems (Managed Luster) for ultra-low latency, catering to specific workload profiles. The typical storage requirements for AI/ML involve vast capacity, high aggregate throughput, millions of requests per second (QPS/IOPS), and low-latency reads, with varying performance aspects across different training profiles.
The presentation further detailed Cloud Storage Fuse, a solution enabling the mounting of a bucket as a local file system. Abela highlighted its heavy investment and significant payoff, addressing the need for file system semantics without rewriting applications for object storage. Cloud Storage Fuse now serves as a high-performance client with features like file cache, parallel download, streaming writes, and Hierarchical Namespace bucket integration. The file cache improves training times, while the parallel download feature drastically speeds up model loading, achieving up to 9x faster load times than FSSpec. Hierarchical namespace buckets offer atomic folder renames for checkpointing, resulting in 30x faster performance.
Abela then introduced Anywhere Cache, a newly GA feature designed to improve performance by co-locating storage on SSD in the same zone as compute. This “turbo button” for Cloud Storage simplifies usage, requiring no code refactoring while reducing time to first byte latency by up to 70% for regional buckets and 96% for multi-regional buckets. A GenAI customer case study demonstrated its effectiveness in model loading, achieving a 99% cache hit rate, eliminating tail latencies, and reducing network egress costs using multi-regional buckets. The presentation also detailed a recommender tool that helps users understand the cacheability of their workload, optimal configuration, throughput, and potential cost savings.
Personnel: Marco Abela
Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.
We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.
Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!