|
This video is part of the appearance, “Google Cloud Presents at AI Infrastructure Field Day 2 – Morning“. It was recorded as part of AI Infrastructure Field Day 2 at 09:00 - 12:00 on April 22, 2025.
Watch on YouTube
Watch on Vimeo
Marco Abella, Product Manager at Google Cloud Storage, presented an overview of Google Cloud’s storage solutions optimized for AI/ML workloads. The presentation addressed the critical role of storage in AI pipelines, emphasizing that an inadequate storage solution can significantly bottleneck GPU utilization, causing idle GPUs and hindering data processing from initial data preparation to model serving. He highlighted two industry-optimized storage types: object storage (Cloud Storage) for persistent, high-throughput storage with virtually unlimited capacity, and parallel file systems (Managed Luster) for ultra-low latency, catering to specific workload profiles. The typical storage requirements for AI/ML involve vast capacity, high aggregate throughput, millions of requests per second (QPS/IOPS), and low-latency reads, with varying performance aspects across different training profiles.
The presentation further detailed Cloud Storage Fuse, a solution enabling the mounting of a bucket as a local file system. Abella highlighted its heavy investment and significant payoff, addressing the need for file system semantics without rewriting applications for object storage. Cloud Storage Fuse now serves as a high-performance client with features like file cache, parallel download, streaming writes, and Hierarchical Namespace bucket integration. The file cache improves training times, while the parallel download feature drastically speeds up model loading, achieving up to 9x faster load times than FSSpec. Hierarchical namespace buckets offer atomic folder renames for checkpointing, resulting in 30x faster performance.
Abella then introduced Anywhere Cache, a newly GA feature designed to improve performance by co-locating storage on SSD in the same zone as compute. This “turbo button” for Cloud Storage simplifies usage, requiring no code refactoring while reducing time to first byte latency by up to 70% for regional buckets and 96% for multi-regional buckets. A GenAI customer case study demonstrated its effectiveness in model loading, achieving a 99% cache hit rate, eliminating tail latencies, and reducing network egress costs using multi-regional buckets. The presentation also detailed a recommender tool that helps users understand the cacheability of their workload, optimal configuration, throughput, and potential cost savings.
Personnel: Marco Abela