|
![]() Sean Derrington, Vivek Saraswat, Ishan Sharma, and Marco Abela presented for Google Cloud at AI Infrastructure Field Day 2 |
This Presentation date is April 22, 2025 at 09:00 - 12:00.
Presenters: Brendan Power, Ilias Katsardis, Ishan Sharma, Manjul Sahay, Marco Abela, Sean Derrington
Introduction to the AI Hypercomputer with Google Cloud
Watch on YouTube
Watch on Vimeo
Sean Derrington, Product Manager, Storage at Google Cloud, introduced the AI Hypercomputer at AI Infrastructure Field Day, highlighting Google Cloud’s investments in making it easier for customers to consume and run their AI workloads. The focus is on infrastructure with consideration to the consumption model and optimized software. The AI Hypercomputer encompasses optimized software and purpose-built hardware, with storage, compute, and networking at its foundation.
A key announcement was Google Cloud Managed Luster, a new offering based on their partnership with DDN and Exascaler. Managed Luster provides a scalable parallel file system ideal for AI and ML workloads, offering petabyte-scale storage, low latency (sub-millisecond), and high throughput (up to a terabyte per second). Google Cloud also announced Anywhere Cache, allowing users to keep data closer to accelerators and improve the performance of AI workloads. Anywhere Cache enables caching up to a petabyte of capacity within a given zone and delivers high bandwidth, up to 2.5 terabytes per second. Rapid Storage delivers high QPS, up to 20 million QPS per bucket, and throughput up to 6 terabytes per second for a given bucket.
The presentation also touched on advancements in computing and networking. Google Cloud announced new A4 and A4 Ultra machines within their GPU portfolio in partnership with NVIDIA, and their seventh-generation TPU, Ironwood, which offers significantly higher performance and memory than previous versions, deployed as a cluster with over 9,200 chips, offering 42.5 exaflops of compute capacity. Additionally, improvements to networking infrastructure with Cloud WAN, providing a fully managed service that enhances performance by up to 40% were discussed. Also, GKE inference helps improve AI training through intelligent routing.
Personnel: Sean Derrington
Storage Intelligence with Google Cloud
Watch on YouTube
Watch on Vimeo
Manjul Sahay, Group Product Manager at Google Cloud Storage, presented on Storage Intelligence with Google Cloud, focusing on helping customers, both enterprises and startups, manage their storage effectively for AI applications. These customers often face challenges in managing storage at scale for security, cost, and operational efficiency, particularly with small and new teams. A key problem is the exponential growth of storage due to the influx of new data and AI-generated content, making object storage management increasingly complex.
The presentation highlighted the difficulties in managing object storage, which often involves billions or trillions of objects spread across multiple buckets and projects. Many customers resort to building custom management tools, leading to significant expenditure on management, sometimes 13% to 24% of the total spend. Google Cloud aims to address this by providing storage intelligence to help customers manage storage, meet them in multi-cloud environments, and reduce or eliminate storage management challenges.
Google Cloud introduced Storage Intelligence to address storage management challenges, offering a unified platform with features like datasets, which provides a metadata repository in BigQuery. Also offered is the ability to identify public objects and batch operations to act on billions of objects, as well as features like bucket migration, all designed to streamline storage management. With a free trial offered, early adopters like Anthropic and Spotify are already experiencing benefits in managing and optimizing their storage infrastructure.
Personnel: Manjul Sahay
AI Hypercomputer Cluster Toolkit with Google Cloud
Watch on YouTube
Watch on Vimeo
Ilias Katsardis, Senior Product Manager for AI infrastructure at Google Cloud, presented on the AI Hypercomputer Cluster Toolkit, addressing the complexities of deploying AI infrastructure on Google Cloud’s compute engine and GKE. He highlighted the challenges customers face when trying to quickly and efficiently create supercomputers in the cloud, including performance uncertainty, troubleshooting difficulties, and potential downtime. These issues often lead to increased time-to-market and costs, which Google Cloud aims to mitigate.
To tackle these problems, Google Cloud developed ClusterDirector, a foundation built upon purpose-built hardware, VMs, Managed Instance Groups, Kubernetes, and GKE. ClusterDirector includes capabilities such as a placement policy to ensure VMs are located in the same rack and switch for optimal performance. Sitting within ClusterDirector is Cluster Toolkit. Katsardis described Cluster Toolkit as the orchestrator for AI and HPC environments. It utilizes Terraform scripts and APIs to combine everything into a single deployment. Customers can define their AI infrastructure or HPC cluster in a blueprint, a concise configuration file that Cluster Toolkit uses to provision the environment.
The presentation introduced the Cluster Toolkit to simplify the deployment and management of AI infrastructure on Google Cloud, addressing the need for turnkey environments that adhere to best practices. While the underlying infrastructure relies on Terraform, the speaker emphasized that customers interact with a simplified blueprint, enabling easier auditing and faster deployment. The discussion also touched on future directions, including user interfaces to further streamline the process and the potential for managed services.
Personnel: Ilias Katsardis
Google Kubernetes Engine and AI Hypercomputer with Google Cloud
Watch on YouTube
Watch on Vimeo
Ishan Sharma, Group Product Manager in the Google Kubernetes Engine team, presented on GKE and AI Hypercomputer, focusing on industry-leading infrastructure, training quickly at mega scale, serving with lower cost and latency, economic access to GPUs and TPUs, and faster time to value. He emphasized that Google Cloud is committed to ensuring new accelerators are available on GKE on day one. The AI Hypercomputer, the entire stack, and a reference architecture, is the same stack that Google uses internally for Vertex AI.
The presentation highlighted Cluster Director for GKE, which enables the deployment, scaling, and management of AI-optimized GKE clusters where physically co-located accelerators function as a single unit, delivering high performance and ultra-low latency. Key benefits include running densely co-located accelerators, mega-scale training jobs, topology-aware scheduling, ease of use, 360-degree observability, and resiliency. Cluster Director for GKE uses standard Kubernetes APIs and the existing ecosystem, which allows users to orchestrate these capabilities.
Sharma also demonstrated the GKE Inference Gateway, which enhances LLM inference responses by routing requests based on model server metrics like KVCache and queue line, reducing variability and improving time to first token latency. Additionally, he showcased the GKE Inference Quickstart, a feature on the GKE homepage within the Google Cloud console, which recommends optimized infrastructure configurations for different models, like the Nvidia L4 for Gemma 2 2B instruction-tuned model. This simplifies model deployment and optimizes performance.
Personnel: Ishan Sharma
Overview of Cloud Storage Storage for AI, Lustre, GCSFuse, and Anywhere cache with Google Cloud
Watch on YouTube
Watch on Vimeo
Marco Abela, Product Manager at Google Cloud Storage, presented an overview of Google Cloud’s storage solutions optimized for AI/ML workloads. The presentation addressed the critical role of storage in AI pipelines, emphasizing that an inadequate storage solution can significantly bottleneck GPU utilization, causing idle GPUs and hindering data processing from initial data preparation to model serving. He highlighted two industry-optimized storage types: object storage (Cloud Storage) for persistent, high-throughput storage with virtually unlimited capacity, and parallel file systems (Managed Luster) for ultra-low latency, catering to specific workload profiles. The typical storage requirements for AI/ML involve vast capacity, high aggregate throughput, millions of requests per second (QPS/IOPS), and low-latency reads, with varying performance aspects across different training profiles.
The presentation further detailed Cloud Storage Fuse, a solution enabling the mounting of a bucket as a local file system. Abela highlighted its heavy investment and significant payoff, addressing the need for file system semantics without rewriting applications for object storage. Cloud Storage Fuse now serves as a high-performance client with features like file cache, parallel download, streaming writes, and Hierarchical Namespace bucket integration. The file cache improves training times, while the parallel download feature drastically speeds up model loading, achieving up to 9x faster load times than FSSpec. Hierarchical namespace buckets offer atomic folder renames for checkpointing, resulting in 30x faster performance.
Abela then introduced Anywhere Cache, a newly GA feature designed to improve performance by co-locating storage on SSD in the same zone as compute. This “turbo button” for Cloud Storage simplifies usage, requiring no code refactoring while reducing time to first byte latency by up to 70% for regional buckets and 96% for multi-regional buckets. A GenAI customer case study demonstrated its effectiveness in model loading, achieving a 99% cache hit rate, eliminating tail latencies, and reducing network egress costs using multi-regional buckets. The presentation also detailed a recommender tool that helps users understand the cacheability of their workload, optimal configuration, throughput, and potential cost savings.
Personnel: Marco Abela