Tech Field Day

The Independent IT Influencer Event

  • Home
    • The Futurum Group
    • FAQ
    • Staff
  • Sponsors
    • Sponsor List
      • 2025 Sponsors
      • 2024 Sponsors
      • 2023 Sponsors
      • 2022 Sponsors
    • Sponsor Tech Field Day
    • Best of Tech Field Day
    • Results and Metrics
    • Preparing Your Presentation
      • Complete Presentation Guide
      • A Classic Tech Field Day Agenda
      • Field Day Room Setup
      • Presenting to Engineers
  • Delegates
    • Delegate List
      • 2025 Delegates
      • 2024 Delegates
      • 2023 Delegates
      • 2022 Delegates
      • 2021 Delegates
      • 2020 Delegates
      • 2019 Delegates
      • 2018 Delegates
    • Become a Field Day Delegate
    • What Delegates Should Know
  • Events
    • All Events
      • Upcoming
      • Past
    • Field Day
    • Field Day Extra
    • Field Day Exclusive
    • Field Day Experience
    • Field Day Live
    • Field Day Showcase
  • Topics
    • Tech Field Day
    • Cloud Field Day
    • Mobility Field Day
    • Networking Field Day
    • Security Field Day
    • Storage Field Day
  • News
    • Coverage
    • Event News
    • Podcast
  • When autocomplete results are available use up and down arrows to review and enter to go to the desired page. Touch device users, explore by touch or with swipe gestures.
You are here: Home / Videos / cPacket Observability for AI

cPacket Observability for AI

July 14, 2025 by



Networking Field Day 38


This video is part of the appearance, “cPacket Presents at Networking Field Day 38“. It was recorded as part of Networking Field Day 38 at 8:00-9:30 on July 10, 2025.


Watch on YouTube
Watch on Vimeo

Modern AI workloads rely on high-performance, low-latency GPU clusters, but traditional observability tools fall short in diagnosing issues across these dense, distributed environments. In this session, cPacket explored how they augment GPU and storage telemetry (DCGM/NVML/IOPS) with full-fidelity packet insights. They covered how to correlate job scheduling, retransmissions, queue depth, and tensor-core utilization in real time, and how to establish performance baselines, auto-trigger mitigations, integrate with SRE dashboards, and continuously tune topologies for maximum AI throughput and resource efficiency. Erik Rudin and Ron Nevo introduced the emerging challenge of AI factories moving into enterprises, contrasting these inference workloads with the well-understood elephant flows of AI training in hyperscale data centers. Inference presents unique, less-understood traffic patterns, often driven by user or agent interactions and characterized by varying query-response ratios and KV cache management policies, all demanding optimal GPU utilization without sacrificing latency.

The core of cPacket’s solution for AI observability lies in supplementing traditional GPU telemetry with packet-level visibility, particularly on the north-south (front-end) network that connects AI clusters to the rest of the enterprise. This integration is crucial for pinpointing the exact source of latency (whether from the cluster, switch, or storage), identifying microbursts that internal switch telemetry might miss, and understanding session-level characteristics that impact AI workload performance. Unlike traditional network monitoring, which often falls short in these highly dynamic and dense environments, cPacket’s approach aims to provide the granular, real-time data necessary for continuous tuning and optimization of AI infrastructures.

Ultimately, cPacket emphasizes that observability for AI is essential for enterprises making significant investments in GPU workloads at the edge. The rapid evolution of AI necessitates a comprehensive approach that integrates packet insights, session metrics, and AI-driven analytics into existing SRE and NetOps workflows. This allows for proactive identification of anomalies, establishment of performance baselines, and continuous optimization of network topologies to ensure maximum AI throughput and resource efficiency, directly impacting the often high costs associated with AI downtime. The overarching message is to start with the business problem–understanding the specific challenges and desired outcomes for AI workloads–and then leverage cPacket’s integrated, open, and AI-infused platform to drive measurable improvements.

Personnel: Erik Rudin, Ron Nevo

  • Bluesky
  • LinkedIn
  • Mastodon
  • RSS
  • Twitter
  • YouTube

Event Calendar

  • Oct 9-Oct 9 — Tech Field Day Exclusive with Microsoft Security
  • Oct 15-Oct 15 — Tech Field Day Experience at NetApp INSIGHT 2025
  • Oct 22-Oct 23 — Cloud Field Day 24
  • Oct 29-Oct 30 — AI Field Day 7
  • Nov 5-Nov 6 — Networking Field Day 39
  • Nov 11-Nov 12 — Tech Field Day at KubeCon North America 2025
  • Jan 28-Jan 29 — AI Infrastructure Field Day 4
  • Apr 29-Apr 30 — Security Field Day 15

Latest Coverage

  • Hammerspace and the Open Flash Platform at #AIIFD3
  • How Mainframe Observability Bridges Legacy and Modern Systems
  • Share Cleveland 25 Took Mainframe to the Next Level
  • PopUp Mainframe: The Key to Faster, Cheaper, and Better Mainframe DevOps
  • Using Agentic AI to Assist Resilience with Opengear

Tech Field Day News

  • The Latest in Cybersecurity Innovation at Security Field Day 14
  • Pushing the Boundaries of AI Performance, Scale, and Innovation at AI Infrastructure Field Day 3

Return to top of page

Copyright © 2025 · Genesis Framework · WordPress · Log in