Tech Field Day

The Independent IT Influencer Event

  • Home
    • The Futurum Group
    • FAQ
    • Staff
  • Sponsors
    • Sponsor List
      • 2025 Sponsors
      • 2024 Sponsors
      • 2023 Sponsors
      • 2022 Sponsors
    • Sponsor Tech Field Day
    • Best of Tech Field Day
    • Results and Metrics
    • Preparing Your Presentation
      • Complete Presentation Guide
      • A Classic Tech Field Day Agenda
      • Field Day Room Setup
      • Presenting to Engineers
  • Delegates
    • Delegate List
      • 2025 Delegates
      • 2024 Delegates
      • 2023 Delegates
      • 2022 Delegates
      • 2021 Delegates
      • 2020 Delegates
      • 2019 Delegates
      • 2018 Delegates
    • Become a Field Day Delegate
    • What Delegates Should Know
  • Events
    • All Events
      • Upcoming
      • Past
    • Field Day
    • Field Day Extra
    • Field Day Exclusive
    • Field Day Experience
    • Field Day Live
    • Field Day Showcase
  • Topics
    • Tech Field Day
    • Cloud Field Day
    • Mobility Field Day
    • Networking Field Day
    • Security Field Day
    • Storage Field Day
  • About Tech Field Day
    • Coverage
    • Podcast
    • Bluesky
  • When autocomplete results are available use up and down arrows to review and enter to go to the desired page. Touch device users, explore by touch or with swipe gestures.
You are here: Home / Appearances / Keysight Presents at AI Field Day 5

Keysight Presents at AI Field Day 5



AI Field Day 5

Linas Dauksa, Ankur Sheth, and Alex Bortok presented for Keysight at AI Field Day 5

This Presentation date is September 11, 2024 at 8:00-9:30.

Presenters: Alex Bortok, Ankur Sheth, Linas Dauksa

Download a whitepaper at https://www.keysight.com/us/en/assets/7124-1061/white-papers/How-AI-ML-Networks-Differ-from-Traditional-Networks.pdf, a solution brief at https://www.keysight.com/us/en/assets/3123-1809/solution-briefs/Keysight-AI-Data-Center-Test-Platform.pdf, or visit https://www.keysight.com/us/en/products/network-test/protocol-load-test/ai-data-center-test-platform.html for more information.


Test Tomorrow’s AI Networks Today with Keysight


Watch on YouTube
Watch on Vimeo

AI deployment is growing rapidly and the race to train and deliver new AI models quickly and efficiently is a top priority. The Keysight AI Data Center Test Platform is designed to accelerate innovation in AI network fabric validation and optimization, enabling you to test today’s AI networks with confidence. This presentation introduces Keysight, the challenges our customers face and why realistic emulation and testing of AI workloads is critical.

In the presentation by Ankur Sheth from Keysight Technologies, the focus is on the rapid growth of AI deployment and the critical need for effective testing of AI network infrastructures. Keysight, with its rich history stemming from Hewlett Packard, has established itself as a leader in test and measurement solutions across various technology sectors. The company has evolved through acquisitions and innovations, positioning itself to address the unique challenges posed by the increasing complexity of AI networks. As AI technologies proliferate, particularly in hyperscale environments, the demand for robust testing solutions becomes paramount to ensure that the underlying infrastructure can support the high bandwidth, low latency, and reliability required for optimal performance.

Sheth highlights the significant role that network failures play in the inefficiencies of AI training jobs, noting that 20% of failures can be attributed to network issues. With GPUs being the most expensive resources in AI infrastructures, it is crucial to minimize their idle time caused by data transfer delays. The challenges of testing at scale are compounded by the high costs and limited availability of GPUs, making it impractical to create large test environments. As a result, the need for realistic emulation and testing of AI workloads is emphasized, as it allows operators to identify and resolve potential network issues before deploying their systems in production.

To address these challenges, Keysight introduces its AI Data Center Test Platform, which combines advanced hardware and software solutions tailored for testing AI network fabrics. This platform enables testing without the need for physical GPUs, thereby alleviating some of the cost and resource constraints faced by operators. The presentation sets the stage for a deeper exploration of the specific tools and methodologies that Keysight offers, such as the ARIES-1 platform of traffic generators, which are designed to facilitate effective testing and validation of AI networks. By providing these innovative solutions, Keysight aims to empower its customers to accelerate their AI initiatives and ensure the reliability of their network infrastructures.

Personnel: Ankur Sheth

Keysight AI Data Center Test Platform Architecture and Capabilities


Watch on YouTube
Watch on Vimeo

Keysight’s AI Data Center Test Platform is designed to emulate AI workloads, enabling users to benchmark and validate the performance of AI infrastructure in both pre-deployment labs and production AI clusters. The platform allows AI operators and equipment vendors to enhance the efficiency of AI model training over Ethernet networks by experimenting with various workload parameters and network designs. Notably, the platform provides comprehensive insights into the performance of communications and RDMA transports without the need for GPUs, making it a cost-effective solution for testing and optimization.

During the presentation, Alex Bortok and Ankur Sheth discussed the critical role of network performance in AI training, emphasizing that a significant portion of GPU time is spent on data communication rather than computation. They highlighted the importance of co-tuning the software stack and network components to achieve optimal performance, particularly as AI workloads grow in complexity and size. The speakers also explained the challenges associated with traditional benchmarking methods, which often fail to correlate performance metrics across different components of the AI infrastructure. The AI Data Center Test Platform addresses these challenges by providing a controlled environment for emulating workloads and generating real traffic, allowing for more accurate performance assessments.

The architecture of the platform is built on Keysight’s Aries 1 series of traffic generators, which can produce Rocky traffic at line rates. The platform’s software stack is API-driven, enabling users to conduct collective benchmarks and analyze results effectively. The presenters outlined the various testing capabilities offered by the platform, including load balancing, congestion control, and topology experimentation, all aimed at reducing the time required for AI model training. By providing deeper insights and repeatable testing conditions, Keysight’s AI Data Center Test Platform positions itself as a valuable tool for optimizing AI infrastructure and accelerating the deployment of AI models.

Personnel: Alex Bortok, Ankur Sheth

Taking the Keysight AI Data Center Test Platform for a Test Drive


Watch on YouTube
Watch on Vimeo

This demonstration of the AI Data Center Test Platform shows how network events impact completion times. The first demo showcases the effects of congestion on completion times and how poor fabric utilization impacts performance. You’ll also see how the AI Data Center Test Platform can show how increasing parallelism of data transfer helps improve utilization and completion times.

In the presentation by Keysight Technologies at AI Field Day 5, Ankur Sheth, Director of AI Test R&D, demonstrated the AI Data Center Test Platform, focusing on how network events impact completion times. The setup involved emulating a server with eight GPUs connected to a two-tier fabric network, using the Arise 1 box to simulate the GPUs and network interface cards (NICs). The demonstration aimed to show the effects of network congestion on performance and how increasing the parallelism of data transfer can improve fabric utilization and completion times. The first scenario examined the impact of congestion on the network, revealing poor performance due to misconfigured congestion control settings.

Sheth explained the configuration and results of running an All Reduce Collective operation, which is commonly used during the backward pass of a training job. The initial test showed that the network’s poor configuration led to low utilization and high latency, with only 25% of the theoretical throughput achieved. Detailed flow completion times and cumulative distribution functions (CDFs) highlighted significant discrepancies in data transfer times, indicating a problem in the network configuration. After adjusting the network settings, particularly the Priority Flow Control (PFC) settings, the performance improved dramatically, achieving 95% utilization and significantly reducing completion times.

In a second experiment, Sheth demonstrated the impact of using different algorithms and increasing the number of Q-Pairs, which are connections used in the RDMA over Converged Ethernet (RoCE) protocol. The halving-doubling algorithm initially showed average performance with significant tail latencies. By increasing the Q-Pairs from one to eight, the network’s performance improved, with more parallel and consistent data transfer times. This change allowed the network to better load balance the traffic, resulting in more efficient utilization. The presentation concluded with a demonstration of how the platform’s metrics and data can be integrated into automated test cases and analyzed using tools like Jupyter notebooks, providing valuable insights for network designers and engineers.

Personnel: Ankur Sheth


  • Bluesky
  • LinkedIn
  • Mastodon
  • RSS
  • Twitter
  • YouTube

Event Calendar

  • May 7-May 9 — Mobility Field Day 13
  • May 13-May 15 — Tech Field Day Experience at Qlik Connect 2025
  • May 28-May 29 — Security Field Day 13
  • Jun 4-Jun 5 — Cloud Field Day 23
  • Jun 10-Jun 11 — Tech Field Day Extra at Cisco Live US 2025
  • Jul 9-Jul 10 — Networking Field Day 38
  • Jul 16-Jul 17 — Edge Field Day 4
  • Jul 23-Jul 24 — AppDev Field Day 3

Latest Links

  • NB525: Cisco, IBM Recruit AI for Threat Response; HPE Air-Gaps Private Clouds
  • Key Takeaways from AI Infrastructure Field Day 2
  • Techstrong Gang – April 29, 2025
  • Google Cloud Builds on Storage Portfolio to Fuel AI Hypercomputer
  • Nutanix: Working on the Easy Button for AI

Return to top of page

Copyright © 2025 · Genesis Framework · WordPress · Log in