Watch on YouTube
Watch on Vimeo
This presentation focuses on the consumption model of AI networks, specifically helping network engineers enable self-service capabilities for AI factory and neocloud operators. Alex Saroyan argues that while network engineers manage complex physical infrastructures, the consumers, such as compute orchestration products, require a simplified cloud-like abstraction. Netris achieves this through its Network Automation, Abstraction, and Multi-tenancy (NAAM) model, which introduces familiar cloud constructs like Virtual Private Clouds (VPCs) and VNets (subnets) to hide the underlying complexity of diverse fabrics like Ethernet and InfiniBand. This allows users to isolate GPU servers and define connectivity through high-level requests while the Netris controller automatically orchestrates the necessary configurations across front-end, back-end, and rack-scale fabrics.
The session also addresses the intricacies of host-level networking and granular multi-tenancy, particularly through the use of Data Processing Units (DPUs). Saroyan explains that traditional VLAN sub-interfacing is often insufficient for bandwidth-intensive AI workloads like KV caching. Therefore, Netris supports hardware-accelerated isolation directly on the DPU. By integrating DPU control planes with leaf switches via EVPN-BGP, Netris creates a unified fabric where virtual functions on a host and physical switch ports can coexist in the same VPC. This complete integration approach prevents the scaling issues associated with disconnected DPU overlays and allows for flexible, hybrid environments where diverse endpoint types, like bare-metal servers and virtual machines, can communicate securely at wire speed.
The presentation also details how Netris handles shared services and internet connectivity through constructs like VPC peering, Direct Connect, and the proprietary SoftGate technology. SoftGate is a horizontally scalable, multi-tenant software gateway that provides NAT, Layer 4 load balancing using the Maglev algorithm, and DHCP without sharing state, mirroring the internal architectures of major hyperscalers. To secure these environments, Netris implements VPC-aware ACLs that are intelligently placed by an algorithm to conserve limited TCAM resources. By dynamically analyzing routing tables, the system decides whether to enforce security rules at the SoftGate for internet traffic, on physical switches for inter-fabric traffic, or directly on DPUs, ensuring optimal performance across the entire AI infrastructure.
Personnel: Alex Saroyan
Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.
We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.
Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!