Tech Field Day

The Independent IT Influencer Event

  • Home
    • The Futurum Group
    • FAQ
    • Staff
  • Sponsors
    • Sponsor List
      • 2026 Sponsors
      • 2025 Sponsors
      • 2024 Sponsors
      • 2023 Sponsors
      • 2022 Sponsors
    • Sponsor Tech Field Day
    • Best of Tech Field Day
    • Results and Metrics
    • Preparing Your Presentation
      • Complete Presentation Guide
      • A Classic Tech Field Day Agenda
      • Field Day Room Setup
      • Presenting to Engineers
  • Delegates
    • Delegate List
      • 2026 Delegates
      • 2025 Delegates
      • 2024 Delegates
      • 2023 Delegates
      • 2022 Delegates
    • Become a Field Day Delegate
    • What Delegates Should Know
  • Events
    • All Events
      • Upcoming
      • Past
    • Field Day
    • Field Day Extra
    • Field Day Exclusive
    • Field Day Experience
    • Field Day Live
    • Field Day Showcase
  • Topics
    • Tech Field Day
    • Cloud Field Day
    • Mobility Field Day
    • Networking Field Day
    • Security Field Day
    • Storage Field Day
  • News
    • Coverage
    • Event News
    • Podcast
  • When autocomplete results are available use up and down arrows to review and enter to go to the desired page. Touch device users, explore by touch or with swipe gestures.
You are here: Home / Videos / Cisco AI Networking Cluster Operations Deep Dive

Cisco AI Networking Cluster Operations Deep Dive



Networking Field Day 39


This video is part of the appearance, “Cisco Presents at Networking Field Day 39“. It was recorded as part of Networking Field Day 39 at 10:30-12:00 on November 6, 2025.


Watch on YouTube
Watch on Vimeo

Paresh Gupta’s deep dive on AI cluster operations focused on the extreme and unique challenges of high-performance backend networks. He explained that these networks, which primarily use RDMA over Converged Ethernet (ROCE), are exceptionally sensitive to both packet loss and network delay. Because ROCE is UDP-based, it lacks TCP’s native congestion control, meaning a single dropped packet can stall an entire collective communication operation, forcing a costly retransmission and wasting expensive GPU cycles. This problem is compounded by AI traffic patterns, such as checkpointing, where all GPUs write to storage simultaneously, creating massive incast congestion. Gupta emphasized that in these environments, where every nanosecond of delay matters, traditional network designs and operational practices are no longer sufficient.

Cisco’s strategy to solve these problems is built on prescriptive, end-to-end validated reference architectures, which are tested with NVIDIA, AMD, Intel Gaudi, and all major storage vendors. Gupta detailed the critical importance of a specific Rail-Optimized Design, a non-blocking topology engineered to ensure single-hop connectivity between all GPUs within a scalable unit. This design minimizes latency by keeping traffic off the spine switches, but its performance is entirely dependent on perfect physical cabling. He explained that these architectures are built on Cisco’s smart switches, which use Silicon One ASICs and are optimized with fine-tuned thresholds for congestion-notification protocols like ECN and PFC.

The most critical innovations, however, are in operational simplicity, delivered via Nexus Dashboard and HyperFabric AI. These platforms automate and hide the underlying network complexity. Gupta highlighted the automated cabling check feature. The system generates a precise cabling plan for the rail-optimized design and provides a task list to on-site technicians; the management UI will only show a port as green when it is cabled to the exact correct port, solving the pervasive and performance-crippling problem of miscabling. This feature, which customers reported reduced deployment time by 90%, is combined with job scheduler integration to detect and flag performance-degrading anomalies, such as a single job being inefficiently spread across multiple scalable units.

Personnel: Paresh Gupta

  • Bluesky
  • LinkedIn
  • Mastodon
  • RSS
  • Twitter
  • YouTube

Event Calendar

  • Jan 28-Jan 30 — AI Infrastructure Field Day 4
  • Mar 11-Mar 12 — Cloud Field Day 25
  • Mar 23-Mar 24 — Tech Field Day Extra at RSAC 2026
  • Apr 8-Apr 10 — Networking Field Day 40
  • Apr 15-Apr 16 — AI AppDev Field Day 3
  • Apr 29-Apr 30 — Security Field Day 15
  • May 6-May 8 — Mobility Field Day 14
  • May 13-May 14 — AI Field Day 8

Latest Coverage

  • From Brownfield Complexity to Automated Fabric at Global Scale
  • Managing Edge AI and Computer Vision at Scale
  • Digitate ignio and the 2025 AIOps Question: Build or Buy?
  • Reimagining AI from a Security Risk into an Asset with Fortinet
  • ResOps: The Convergence of Security and Operations

Tech Field Day News

  • Cutting-Edge AI Networking and Storage Kick Off 2026 at AI Infrastructure Field Day 4
  • Commvault Shift 2025 Live Blog

Return to top of page

Copyright © 2026 · Genesis Framework · WordPress · Log in