Watch on YouTube
Watch on Vimeo
Cisco presented its vision for intelligent observability and AgenticOps in data center networking, emphasizing how AI is no longer optional for modern data centers. Anant Shah, Senior Product Manager, highlighted the goal of eliminating silos and reducing mean time to resolution by integrating various Cisco products. The discussion extended beyond traditional network monitoring to include applications, tokenomics, and agentic observability, underscoring that AI-capable data centers require comprehensive visibility and management across compute, network infrastructure, and applications. This comprehensive approach is foundational to Cisco Nexus One, an evolution of their SDN strategy, built on pillars of silicon, system, optics, software, and the Nexus Dashboard operating model, leveraging both Cisco’s proprietary silicon and partnerships with NVIDIA for high-performance, lossless infrastructure.
The Nexus Dashboard serves as the central operating model, streamlining AI fabric provisioning with one-click automation and predefined templates suitable for diverse inferencing and training architectures. It incorporates advanced features like dynamic load balancing and adapts to changing fabric parameters. Crucially, Nexus Dashboard extends its observability capabilities beyond the network fabric. By integrating with technologies such as Slurm and NVIDIA DCGM, it provides real-time metrics on GPU statistics, optics, and GPU NICs. This enhanced visibility within a single pane of glass allows network administrators to quickly identify whether a network issue is the root cause of performance problems, aiming to establish “mean time to innocence” for network operations rather than replacing broader compute management platforms.
Further enhancing this holistic view, Cisco announced the integration of native Splunk directly within the Nexus Dashboard. This on-premise Splunk instance serves as a sovereign data lake, providing powerful troubleshooting capabilities and reducing the complexity and potential costs associated with data ingestion into separate Splunk servers or cloud platforms. It enables customers to create custom dashboards and alerts, combining infrastructure audit logs with GPU data to monitor network changes and their impact on AI jobs. Beyond the infrastructure, Splunk observability, utilizing OpenTelemetry, offers insights into AI applications, tokenomics, storage, Kubernetes pods, and LLMs. This integration allows for federated search between Nexus Dashboard’s local Splunk and other Splunk observability instances, maintaining data sovereignty while delivering end-to-end visibility for AI operations teams.
Personnel: Anant Shah
Watch on YouTube
Watch on Vimeo
Cisco’s presentation at AI Field Day 8 addressed the growing complexities in enterprise network operations, where siloed teams across NetOps, SecOps, and AppOps grapple with a sprawl of disparate dashboards and a lack of consistent context during incident response. Meghan Kachhi emphasized that while agentic AI is no longer just a buzzword, its adoption is challenged by issues like data silos, the absence of domain-specific expertise in generic large language models, and a general lack of trust in closed-loop automation due to security concerns or opaque audit trails. The presentation highlighted the evolution of network operations from CLI and GUIs to software-defined networking and AIOps, positioning agentic ops as the latest layer of enhancement rather than a complete replacement of prior methodologies.
To tackle these challenges, especially for customers opting for their own large language models, Cisco introduces the Multi-Capability Platform (MCP) server integrated with Nexus Dashboard. This MCP server acts as a standardized, secure interface, simplifying how customer-owned AI agents interact with the network infrastructure. Instead of navigating thousands of complex REST API endpoints, agents can leverage the MCP server to call specific, high-value tools. The MCP intelligently handles complex API calls, pagination, and semantic understanding, providing aggregated schemas and data to the AI agents while establishing guardrails for access and scope, making the Nexus Dashboard a single, governed source of truth for network intelligence.
The operational flow demonstrated begins with an AI agent in an “assisted ops” mode, allowing human oversight. It identifies incidents, queries the MCP server for network anomalies on the Nexus Dashboard, verifies configurations, and initiates fixes using a network agent. Upon resolution, the MCP server’s integration capabilities extend to third-party platforms like PagerDuty for incident resolution, Confluence for knowledge base article creation, and Slack for team communication. This framework allows enterprises to build domain-specific skills and learn from resolved issues, with Cisco providing the foundational infrastructure for agentic AI. This approach supports a gradual transition from human-assisted operations to fully automated agentic ops, empowering organizations to address critical issues like AI/ML job performance degradation and security segmentation.
Personnel: Meghan Kachhi
Watch on YouTube
Watch on Vimeo
Cisco is addressing the pervasive fragmentation in IT operations, where critical problems often take days or even months to resolve due to disparate data sources and manual correlation across various domains like applications, security, compute, and networking. With different products and specialized teams managing these silos, efficient troubleshooting is a significant challenge. Cisco’s response is Agentic AI, a strategy that leverages its extensive portfolio of world-class solutions across these domains to create a unified operational experience. This initiative introduces Cisco Cloud Control, an AI-native management platform designed to bring together data, context, and IT personas from across the infrastructure.
At the core of Cisco Cloud Control are two main AI components: AI Canvas and AI Assistant. The AI Assistant functions as a chatbot, enabling users to ask contextual questions about any element connected to Cloud Control. AI Canvas serves as a dynamic, collaborative environment where cross-product data and diverse IT personas converge to triage issues. It features a central canvas board where a purpose-built AI model automatically renders relevant widgets and data in real time, a concept referred to as Generative UI. This collaborative space allows different IT roles, such as application, compute, or network administrators, to work together with shared visibility of the troubleshooting context, while still enforcing granular role-based access control for actions.
The underlying multi-agent architecture enables parallelized debugging, where agents associated with products like Nexus Dashboard, Nexus Hyperfabric, and InterSight simultaneously investigate their respective domains for problem symptoms. This approach dramatically accelerates root cause analysis, moving away from sequential troubleshooting. A demonstration illustrated how a slow AI training job was quickly diagnosed, correlating high GPU utilization with network degradation traced to CRC errors and an overheated optical module. The platform facilitated seamless collaboration between compute and network administrators, proposed a remediation, and even generated a comprehensive report detailing the problem, investigation timeline, remediation steps, and future recommendations, streamlining communication and post-incident analysis for a more efficient and assisted operations model.
Personnel: Arun Annavarapu
Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.
We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.
Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!