Watch on YouTube
Watch on Vimeo
Ishan Sharma, Group Product Manager in the Google Kubernetes Engine team, presented on GKE and AI Hypercomputer, focusing on industry-leading infrastructure, training quickly at mega scale, serving with lower cost and latency, economic access to GPUs and TPUs, and faster time to value. He emphasized that Google Cloud is committed to ensuring new accelerators are available on GKE on day one. The AI Hypercomputer, the entire stack, and a reference architecture, is the same stack that Google uses internally for Vertex AI.
The presentation highlighted Cluster Director for GKE, which enables the deployment, scaling, and management of AI-optimized GKE clusters where physically co-located accelerators function as a single unit, delivering high performance and ultra-low latency. Key benefits include running densely co-located accelerators, mega-scale training jobs, topology-aware scheduling, ease of use, 360-degree observability, and resiliency. Cluster Director for GKE uses standard Kubernetes APIs and the existing ecosystem, which allows users to orchestrate these capabilities.
Sharma also demonstrated the GKE Inference Gateway, which enhances LLM inference responses by routing requests based on model server metrics like KVCache and queue line, reducing variability and improving time to first token latency. Additionally, he showcased the GKE Inference Quickstart, a feature on the GKE homepage within the Google Cloud console, which recommends optimized infrastructure configurations for different models, like the Nvidia L4 for Gemma 2 2B instruction-tuned model. This simplifies model deployment and optimizes performance.
Personnel: Ishan Sharma
Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.
We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.
Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!