Follow on Twitter using the following hashtags or usernames: #NFD40
Watch on YouTube
Watch on Vimeo
CEO and co-founder Alex Saroyan discusses the evolution of network engineering in the era of AI. Saroyan highlights that AI networking significantly differs from traditional data center networking due to the massive scale of GPU clusters, ranging from 1,000 to over 50,000 GPUs, and the sheer density of network switches involved. Netris introduces the concept of Network Automation, Abstraction, and Multi-tenancy (NAAM), a technology category designed to succeed SDN and intent-based networking. Unlike previous models that relied on software-based overlays, NAM is engineered for physical, high-performance fabrics where massive traffic volumes make traditional SDN impractical. The presentation emphasizes that Netris has gained significant traction as the first NVIDIA-validated network automation vendor, serving a growing neocloud market of AI factory operators.
The complexity of AI networking is illustrated through the cake of networking layers, which includes front-end, back-end (East-West), and scale-up (NVLink) fabrics. Saroyan explains that even a relatively small cluster of 512 GPUs involves over 1,200 network links, making manual configuration or “fire and forget” approaches impossible. A critical requirement for these operators is multi-tenancy, not just for security, but for operational maintenance. Since operators cannot access tenant data, they must have the ability to dynamically move servers between a tenant environment and a service environment to perform repairs or OS reinstalls. To manage this, Netris utilizes a distributed architecture with agents residing on switches and DPUs, allowing for scalable, consistent, and validated configurations across thousands of nodes without the bottlenecks of centralized automation tools like Ansible.
End-to-end orchestration in a Netris environment covers diverse hardware and protocols, including Ethernet, InfiniBand, and specialized RDMA-over-Converged-Ethernet (RoCE) setups. Netris maintains a vendor-neutral stance, supporting NVIDIA, Arista, and Broadcom-based Sonic switches, as well as integrating with NVIDIA’s UFM for InfiniBand and partitioning NVLink for rack-scale isolation. The session concludes by emphasizing that NAM acts as a bridge between compute orchestrators, such as Kubernetes, CloudStack, or NVIDIA Nico, and the underlying physical network. By providing robust API-level collaboration, Netris enables network engineers to design, simulate, and troubleshoot massive AI fabrics, ensuring that the physical infrastructure can keep pace with the dynamic demands of AI compute and multi-tenant resource allocation.
Personnel: Alex Saroyan
Watch on YouTube
Watch on Vimeo
Alex Saroyan, CEO and co-founder of Netris, provides insights from the company’s experience in deploying and automating large-scale GPU clusters. This second part of the presentation focuses specifically on the life cycle of AI networking, emphasizing that sustainable AI business strategies require architecting for long-term growth and newer GPU generations rather than single-cluster deployments. Saroyan highlights the high financial stakes of these deployments, noting that organizations cannot afford play time once hardware arrives; the networking infrastructure must be ready to go live immediately to start generating revenue.
To meet these aggressive timelines, Netris utilizes a sophisticated modeling and simulation phase, often referred to as a digital twin, using technologies like NVIDIA’s DSX Air or Netris’s own CloudSim. This approach allows network engineering teams to model the topology, IP addressing, and cloud constructs, such as VXLANs and VRFs, before the physical hardware is even on-site. By pre-validating configurations in a simulated environment, teams can identify missing upstream connections or storage integration issues early. Once the hardware arrives, a Zero Touch Provisioning (ZTP) process identifies switches via MAC addresses and brings them into the master topology, where the Netris controller automatically generates and applies the necessary configurations without engineers having to write manual switch code.
The Netris platform further streamlines the deployment life cycle through automated validation and troubleshooting. Since the system operates based on intent-generated configurations rather than static files, the Netris agent on each node can instantly detect miswiring or link discrepancies and provide specific instructions for remediation. Saroyan explains that while human error in configuration is largely eliminated by the automation, hardware or control plane anomalies can still occur at scale. Consequently, Netris continues to expand its suite of smart tests to help engineers identify zombie switches or performance bottlenecks, providing a green status validation that allows GPU implementation teams to begin their work with total confidence in the underlying fabric.
Personnel: Alex Saroyan
Watch on YouTube
Watch on Vimeo
This presentation focuses on the consumption model of AI networks, specifically helping network engineers enable self-service capabilities for AI factory and neocloud operators. Alex Saroyan argues that while network engineers manage complex physical infrastructures, the consumers, such as compute orchestration products, require a simplified cloud-like abstraction. Netris achieves this through its Network Automation, Abstraction, and Multi-tenancy (NAAM) model, which introduces familiar cloud constructs like Virtual Private Clouds (VPCs) and VNets (subnets) to hide the underlying complexity of diverse fabrics like Ethernet and InfiniBand. This allows users to isolate GPU servers and define connectivity through high-level requests while the Netris controller automatically orchestrates the necessary configurations across front-end, back-end, and rack-scale fabrics.
The session also addresses the intricacies of host-level networking and granular multi-tenancy, particularly through the use of Data Processing Units (DPUs). Saroyan explains that traditional VLAN sub-interfacing is often insufficient for bandwidth-intensive AI workloads like KV caching. Therefore, Netris supports hardware-accelerated isolation directly on the DPU. By integrating DPU control planes with leaf switches via EVPN-BGP, Netris creates a unified fabric where virtual functions on a host and physical switch ports can coexist in the same VPC. This complete integration approach prevents the scaling issues associated with disconnected DPU overlays and allows for flexible, hybrid environments where diverse endpoint types, like bare-metal servers and virtual machines, can communicate securely at wire speed.
The presentation also details how Netris handles shared services and internet connectivity through constructs like VPC peering, Direct Connect, and the proprietary SoftGate technology. SoftGate is a horizontally scalable, multi-tenant software gateway that provides NAT, Layer 4 load balancing using the Maglev algorithm, and DHCP without sharing state, mirroring the internal architectures of major hyperscalers. To secure these environments, Netris implements VPC-aware ACLs that are intelligently placed by an algorithm to conserve limited TCAM resources. By dynamically analyzing routing tables, the system decides whether to enforce security rules at the SoftGate for internet traffic, on physical switches for inter-fabric traffic, or directly on DPUs, ensuring optimal performance across the entire AI infrastructure.
Personnel: Alex Saroyan
Watch on YouTube
Watch on Vimeo
This presentation provides a technical demonstration of the Netris controller’s capabilities in automating and simulating large-scale AI networking environments. The demonstration highlights the platform’s Day Zero functionality, where network engineers can model complex topologies and validate designs before physical hardware even arrives. By utilizing Terraform and the Netris CloudSim, Saroyan illustrates how a controller can generate a digital twin of a 64-GPU cluster, defining inventory, topology, and IP address management (IPAM) for both front-end and back-end fabrics. This proactive approach allows teams to export cable maps and labels in advance, ensuring that the physical rollout is guided by a pre-validated, consistent architectural model.
The simulation phase reveals a distributed architecture where Netris agents are deployed to each virtual switch to handle configuration generation and real-time monitoring. Saroyan emphasizes that the central controller does not store or push static configuration files. Instead, the agents autonomously generate and apply the necessary code based on the modeled intent. This object-oriented approach eliminates the need for traditional configuration backups, as a replaced switch can simply re-generate its own configuration upon joining the network. The demo shows the system identifying wiring inconsistencies and automatically configuring BGP sessions and LACP bonding, effectively transforming raw hardware into a functional multi-tenant fabric with high visibility into the health and connectivity of every link.
In the final portion of the demonstration, Saroyan showcases the platform’s multi-tenant consumption model and its advanced integration with NVIDIA Bluefield DPUs. Using a server cluster template, he demonstrates how a consumer can provision isolated bare-metal environments (VPCs) with overlapping IP addresses in minutes, while the network engineer maintains control over the underlying L2/L3 VPN constructs. The demo concludes with a physical lab test involving hardware-accelerated host-based networking. By extending the EVPN-BGP control plane directly to the DPU, Netris enables seamless, wire-speed communication between bare-metal servers and virtual machines. This comprehensive automation workflow, extending from initial modeling to elastic IP provisioning and hardware-accelerated virtualization, illustrates Netris’s ability to simplify the entire lifecycle of high-performance AI infrastructure.
Personnel: Alex Saroyan
Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.
We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.
Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!