Watch on YouTube
Watch on Vimeo
Cisco presented its vision for intelligent observability and AgenticOps in data center networking, emphasizing how AI is no longer optional for modern data centers. Anant Shah, Senior Product Manager, highlighted the goal of eliminating silos and reducing mean time to resolution by integrating various Cisco products. The discussion extended beyond traditional network monitoring to include applications, tokenomics, and agentic observability, underscoring that AI-capable data centers require comprehensive visibility and management across compute, network infrastructure, and applications. This comprehensive approach is foundational to Cisco Nexus One, an evolution of their SDN strategy, built on pillars of silicon, system, optics, software, and the Nexus Dashboard operating model, leveraging both Cisco’s proprietary silicon and partnerships with NVIDIA for high-performance, lossless infrastructure.
The Nexus Dashboard serves as the central operating model, streamlining AI fabric provisioning with one-click automation and predefined templates suitable for diverse inferencing and training architectures. It incorporates advanced features like dynamic load balancing and adapts to changing fabric parameters. Crucially, Nexus Dashboard extends its observability capabilities beyond the network fabric. By integrating with technologies such as Slurm and NVIDIA DCGM, it provides real-time metrics on GPU statistics, optics, and GPU NICs. This enhanced visibility within a single pane of glass allows network administrators to quickly identify whether a network issue is the root cause of performance problems, aiming to establish “mean time to innocence” for network operations rather than replacing broader compute management platforms.
Further enhancing this holistic view, Cisco announced the integration of native Splunk directly within the Nexus Dashboard. This on-premise Splunk instance serves as a sovereign data lake, providing powerful troubleshooting capabilities and reducing the complexity and potential costs associated with data ingestion into separate Splunk servers or cloud platforms. It enables customers to create custom dashboards and alerts, combining infrastructure audit logs with GPU data to monitor network changes and their impact on AI jobs. Beyond the infrastructure, Splunk observability, utilizing OpenTelemetry, offers insights into AI applications, tokenomics, storage, Kubernetes pods, and LLMs. This integration allows for federated search between Nexus Dashboard’s local Splunk and other Splunk observability instances, maintaining data sovereignty while delivering end-to-end visibility for AI operations teams.
Personnel: Anant Shah
Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.
We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.
Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!