Tech Field Day

The Independent IT Influencer Event

  • Home
    • The Futurum Group
    • FAQ
    • Staff
  • Sponsors
    • Sponsor List
      • 2025 Sponsors
      • 2024 Sponsors
      • 2023 Sponsors
      • 2022 Sponsors
    • Sponsor Tech Field Day
    • Best of Tech Field Day
    • Results and Metrics
    • Preparing Your Presentation
      • Complete Presentation Guide
      • A Classic Tech Field Day Agenda
      • Field Day Room Setup
      • Presenting to Engineers
  • Delegates
    • Delegate List
      • 2025 Delegates
      • 2024 Delegates
      • 2023 Delegates
      • 2022 Delegates
      • 2021 Delegates
      • 2020 Delegates
      • 2019 Delegates
      • 2018 Delegates
    • Become a Field Day Delegate
    • What Delegates Should Know
  • Events
    • All Events
      • Upcoming
      • Past
    • Field Day
    • Field Day Extra
    • Field Day Exclusive
    • Field Day Experience
    • Field Day Live
    • Field Day Showcase
  • Topics
    • Tech Field Day
    • Cloud Field Day
    • Mobility Field Day
    • Networking Field Day
    • Security Field Day
    • Storage Field Day
  • News
    • Coverage
    • Event News
    • Podcast
  • When autocomplete results are available use up and down arrows to review and enter to go to the desired page. Touch device users, explore by touch or with swipe gestures.
You are here: Home / Videos / Secure and optimize AI and ML workloads with the Cross-Cloud Network with Google Cloud

Secure and optimize AI and ML workloads with the Cross-Cloud Network with Google Cloud



AI Infrastructure Field Day 2


This video is part of the appearance, “Google Cloud Presents at AI Infrastructure Field Day 2 – Afternoon“. It was recorded as part of AI Infrastructure Field Day 2 at 13:00 - 16:30 on April 22, 2025.


Watch on YouTube
Watch on Vimeo

Vaibhav Katkade, a Product Manager at Google Cloud Networking, presented on infrastructure enhancements in cloud networking for secure, optimized AI/ML workloads. Focusing on the lifecycle of AI/ML, encompassing training, fine-tuning, and serving/inference, and the corresponding network imperatives for each stage. Data ingestion relies on fast, secure connectivity to on-premises environments via interconnect and cross-cloud interconnect, facilitating high-speed data transfer. GKE clusters now support up to 65,000 nodes, a significant increase in scale, enabling the training of large models like Gemini. Improvements to cloud load balancing enhance performance, particularly for LLM workloads.

A key component discussed was the GKE inference gateway, which optimizes LLM serving. It leverages inference metrics from model servers like VLM, Triton, Dynamo, and Google’s Jetstream to perform load balancing based on KV cache utilization, improving performance. The gateway also supports autoscaling based on model server metrics, dynamically adjusting compute allocation based on request load and GPU utilization. Additionally, it enables multiplexing and loading multiple model use cases on a single base model using LoRa fine-tuned adapters, increasing model serving density and efficient use of accelerators. The gateway supports multi-region capacity chasing and integration with security tools like Google’s Model Armor, Palo Alto Networks, and NVIDIA’s Nemo Guardrails.

Katkade also covered key considerations for running inference at scale on Kubernetes. One significant challenge addressed is the constrained availability of GPU/TPU capacity across regions. Google’s solution allows routing to regions with available capacity through a single inference gateway, streamlining operations and improving capacity utilization. Platform and infrastructure teams gain centralized control and consistent baseline coverage across all models by integrating AI security tools directly at the gateway level. Further discussion included load balancing optimization based on KV cache utilization, achieving up to 60% lower latency and 40% higher throughput. The gateway supports model name-based routing and prioritization, compliant with the OpenAI API spec, and allows for different autoscaling thresholds for production and development workloads.

Personnel: Vaibhav Katkade


  • Bluesky
  • LinkedIn
  • Mastodon
  • RSS
  • Twitter
  • YouTube

Event Calendar

  • May 28-May 29 — Security Field Day 13
  • Jun 4-Jun 5 — Cloud Field Day 23
  • Jun 10-Jun 11 — Tech Field Day Extra at Cisco Live US 2025
  • Jul 9-Jul 10 — Networking Field Day 38
  • Jul 16-Jul 17 — Edge Field Day 4
  • Jul 23-Jul 24 — AppDev Field Day 3
  • Sep 10-Sep 11 — AI Infrastructure Field Day 3
  • Oct 29-Oct 30 — AI Field Day 7

Latest Links

  • Meraki Campus Gateway: Cloud-Managed Overlay for Complex Networks
  • Exploring the Future of Cybersecurity at Security Field Day 13
  • 5G Neutral Host: Solving Enterprise Cellular Coverage Gaps
  • Qlik Connect 2025: Answers For Agentic AI
  • Scaling Wi-Fi with Arista Networks EVPN using VESPA and MRO

Return to top of page

Copyright © 2025 · Genesis Framework · WordPress · Log in