Tech Field Day

The Independent IT Influencer Event

  • Home
    • The Futurum Group
    • FAQ
    • Staff
  • Sponsors
    • Sponsor List
      • 2026 Sponsors
      • 2025 Sponsors
      • 2024 Sponsors
      • 2023 Sponsors
      • 2022 Sponsors
    • Sponsor Tech Field Day
    • Best of Tech Field Day
    • Results and Metrics
    • Preparing Your Presentation
      • Complete Presentation Guide
      • A Classic Tech Field Day Agenda
      • Field Day Room Setup
      • Presenting to Engineers
  • Delegates
    • Delegate List
      • 2026 Delegates
      • 2025 Delegates
      • 2024 Delegates
      • 2023 Delegates
      • 2022 Delegates
    • Become a Field Day Delegate
    • What Delegates Should Know
  • Events
    • All Events
      • Upcoming
      • Past
    • Field Day
    • Field Day Extra
    • Field Day Exclusive
    • Field Day Experience
    • Field Day Live
    • Field Day Showcase
  • Topics
    • Tech Field Day
    • Cloud Field Day
    • Mobility Field Day
    • Networking Field Day
    • Security Field Day
    • Storage Field Day
  • News
    • Coverage
    • Event News
    • Podcast
  • When autocomplete results are available use up and down arrows to review and enter to go to the desired page. Touch device users, explore by touch or with swipe gestures.
You are here: Home / Videos / Cisco AI Cluster Design, Automation, and Visibility

Cisco AI Cluster Design, Automation, and Visibility



AI Infrastructure Field Day 4


This video is part of the appearance, “Cisco Data Center Networking Presents at AI Infrastructure Field Day“. It was recorded as part of AI Infrastructure Field Day 4 at 08:00AM – 09:30AM PT on January 28, 2026.


Watch on YouTube
Watch on Vimeo

Cisco’s presentation on AI Cluster Design, Automation, and Visibility, led by Meghan Kachhi and Richard Licon, aims to simplify AI infrastructure and address the challenges of lengthy design and troubleshooting cycles for GPU clusters. The core focus is on enhancing cluster designs, automating deployments, and providing end-to-end visibility to protect a competitive edge. The session outlines Cisco’s reference architectures, key components for building AI clusters, and upcoming updates to its Nexus Dashboard platform, which is expected to streamline design, automation, and monitoring at scale. This comprehensive approach is crucial because the battle for AI success lies at the infrastructure layer, ensuring GPUs are not underutilized by network inefficiencies.

Cisco leverages three unique pillars in its AI networking strategy. Firstly, its systems feature custom Silicon One platforms with programmable pipelines that quickly adapt to evolving AI infrastructure demands, and a partnership with NVIDIA that provides NX-OS on NVIDIA Spectrum X silicon to ensure full-stack reference architecture compliance. Rigorously tested transceivers and a mature NX-OS software, now optimized for AI workloads, complete the system offerings. Secondly, the operating model includes the Nexus Dashboard for on-premises management and Nexus Hyperfabric for a full-stack, cloud-managed solution, complemented by an API-first approach to seamless integration with existing customer automation frameworks. Thirdly, extensive AI reference architectures serve as validated blueprints, spanning enterprise-scale deployments (under 1024 GPUs) to hyperscale cloud environments (1K-16K+ GPUs), providing detailed component lists and ensuring a consistent networking experience across vendors such as NVIDIA, AMD, and storage solutions. An AI cluster is broadly defined to encompass front-end, storage, and backend GPU-to-GPU networks, with a growing trend toward convergence enabled by high-speed Ethernet to unify operating models.

Designing an efficient AI backend network requires a non-blocking architecture that maintains a 1:1 subscription ratio, keeping every GPU within one hop of others for optimal communication. Cisco employs a “scalable unit” concept, enabling incremental expansion by repeating validated blocks while adjusting spine-layer connectivity to maintain high performance. For smaller-scale deployments, such as a 32-GPU university cluster, Cisco demonstrates how front-end, storage, and backend networks can be converged onto fewer, high-density switches, simplifying infrastructure. A critical consideration for such converged environments is Cisco’s policy-based load balancing, an innovation leveraging Silicon One ASICs. This enables preferential treatment of critical traffic, such as GPU-to-GPU training, over storage or front-end traffic, ensuring AI jobs run with minimal latency and maximum GPU utilization, even when sharing network resources.

Personnel: Meghan Kachhi, Richard Licon

  • Bluesky
  • LinkedIn
  • Mastodon
  • RSS
  • Twitter
  • YouTube

Event Calendar

  • Jan 28-Jan 30 — AI Infrastructure Field Day 4
  • Mar 11-Mar 12 — Cloud Field Day 25
  • Mar 23-Mar 24 — Tech Field Day Extra at RSAC 2026
  • Apr 8-Apr 10 — Networking Field Day 40
  • Apr 15-Apr 16 — AI AppDev Field Day 3
  • Apr 29-Apr 30 — Security Field Day 15
  • May 6-May 8 — Mobility Field Day 14
  • May 13-May 14 — AI Field Day 8

Latest Coverage

  • From Brownfield Complexity to Automated Fabric at Global Scale
  • Managing Edge AI and Computer Vision at Scale
  • Digitate ignio and the 2025 AIOps Question: Build or Buy?
  • Reimagining AI from a Security Risk into an Asset with Fortinet
  • ResOps: The Convergence of Security and Operations

Tech Field Day News

  • Cutting-Edge AI Networking and Storage Kick Off 2026 at AI Infrastructure Field Day 4
  • Commvault Shift 2025 Live Blog

Return to top of page

Copyright © 2026 · Genesis Framework · WordPress · Log in