|
This video is part of the appearance, “Google Cloud Presents at AI Infrastructure Field Day 2 – Morning“. It was recorded as part of AI Infrastructure Field Day 2 at 09:00 - 12:00 on April 22, 2025.
Watch on YouTube
Watch on Vimeo
Ishan Sharma, Group Product Manager in the Google Kubernetes Engine team, presented on GKE and AI Hypercomputer, focusing on industry-leading infrastructure, training quickly at mega scale, serving with lower cost and latency, economic access to GPUs and TPUs, and faster time to value. He emphasized that Google Cloud is committed to ensuring new accelerators are available on GKE on day one. The AI Hypercomputer, the entire stack, and a reference architecture, is the same stack that Google uses internally for Vertex AI.
The presentation highlighted Cluster Director for GKE, which enables the deployment, scaling, and management of AI-optimized GKE clusters where physically co-located accelerators function as a single unit, delivering high performance and ultra-low latency. Key benefits include running densely co-located accelerators, mega-scale training jobs, topology-aware scheduling, ease of use, 360-degree observability, and resiliency. Cluster Director for GKE uses standard Kubernetes APIs and the existing ecosystem, which allows users to orchestrate these capabilities.
Sharma also demonstrated the GKE Inference Gateway, which enhances LLM inference responses by routing requests based on model server metrics like KVCache and queue line, reducing variability and improving time to first token latency. Additionally, he showcased the GKE Inference Quickstart, a feature on the GKE homepage within the Google Cloud console, which recommends optimized infrastructure configurations for different models, like the Nvidia L4 for Gemma 2 2B instruction-tuned model. This simplifies model deployment and optimizes performance.
Personnel: Ishan Sharma