Vivek Sarswat, Group Product Manager at Google Cloud Storage, presented on analytics storage and AI, focusing on data preparation and data lakes. He emphasized the close ties between analytics and AI workloads, highlighting key innovations built to address related challenges. The presentation demonstrates that analytics play a crucial role in the AI data pipeline, particularly […]
The latest in high-performance storage, Rapid on Colossus with Google Cloud
Michal Szymaniak, Principal Engineer at Google Cloud, presented on Rapid Storage, a new zonal storage product within the cloud storage portfolio, powered by Google’s foundational distributed file system, Colossus. The goal in designing Rapid Storage was to create a storage system that offers the low latency of block storage, the high throughput of parallel file […]
Intro to Managed Lustre with Google Cloud
Dan Eawaz, Senior Product Manager at Google Cloud, introduced Managed Lustre with Google Cloud, a fully managed parallel file system built on DDN Exascaler. The aim is to solve the demanding requirements of data preparation, model training, and inference in AI workloads. Managed Lustre provides high throughput to keep GPUs and TPUs fully utilized and […]
Overview of Cloud Storage Storage for AI, Lustre, GCSFuse, and Anywhere cache with Google Cloud
Marco Abela, Product Manager at Google Cloud Storage, presented an overview of Google Cloud’s storage solutions optimized for AI/ML workloads. The presentation addressed the critical role of storage in AI pipelines, emphasizing that an inadequate storage solution can significantly bottleneck GPU utilization, causing idle GPUs and hindering data processing from initial data preparation to model […]
Google Kubernetes Engine and AI Hypercomputer with Google Cloud
Ishan Sharma, Group Product Manager in the Google Kubernetes Engine team, presented on GKE and AI Hypercomputer, focusing on industry-leading infrastructure, training quickly at mega scale, serving with lower cost and latency, economic access to GPUs and TPUs, and faster time to value. He emphasized that Google Cloud is committed to ensuring new accelerators are […]
AI Hypercomputer Cluster Toolkit with Google Cloud
Ilias Katsardis, Senior Product Manager for AI infrastructure at Google Cloud, presented on the AI Hypercomputer Cluster Toolkit, addressing the complexities of deploying AI infrastructure on Google Cloud’s compute engine and GKE. He highlighted the challenges customers face when trying to quickly and efficiently create supercomputers in the cloud, including performance uncertainty, troubleshooting difficulties, and […]
Storage Intelligence with Google Cloud
Manjul Sahay, Group Product Manager at Google Cloud Storage, presented on Storage Intelligence with Google Cloud, focusing on helping customers, both enterprises and startups, manage their storage effectively for AI applications. These customers often face challenges in managing storage at scale for security, cost, and operational efficiency, particularly with small and new teams. A key […]
Introduction to the AI Hypercomputer with Google Cloud
Sean Derrington, Product Manager, Storage at Google Cloud, introduced the AI Hypercomputer at AI Infrastructure Field Day, highlighting Google Cloud’s investments in making it easier for customers to consume and run their AI workloads. The focus is on infrastructure with consideration to the consumption model and optimized software. The AI Hypercomputer encompasses optimized software and […]
Demonstrating Keysight’s AI Fabric Test Methodology
This session provides an overview of the Keysight AI fabric test methodology, demonstrating key findings and improvements achieved through automated testing and the search for optimal configuration parameters. Alex Bortek, Lead Product Manager at Keysight Technologies, introduces the Keysight AI fabric test methodology using the Kai Data Center Builder product. The methodology guides users through […]
Maximizing the Performance of AI Backend Fabric with Keysight
This session provides an overview of the Keysight AI (KAI) Data Center Builder solution and how it supports each phase of AI data center design and deployment with actionable data to improve performance and increase the reliability of AI clusters. The presentation explains how KAI Data Center Builder helps streamline the design process, optimizes resource […]
Building Trust at Scale. How Crusoe Validates Network Infrastructure for AI Workloads with Keysight
In this session, Crusoe shares how they are actively testing frontend networks and inter-VM/host data transfers that feed their GPU clusters. By validating the performance, reliability, and scalability of its infrastructure early, Crusoe aims to identify and resolve issues internally, minimizing the chance that end customers will discover them first. This is a differentiator for […]
Validating Frontend Networks to Optimize and Secure Low- Latency LLM Data Flow with Keysight
As large language models scale, new challenges emerge – not only in maximizing GPU performance but also in validating the infrastructure that fuels the data pipeline used for training. On the front end, this includes securely ingesting user data from distributed cloud and customer environments into centralized AI data centers and ensuring high-speed, low-latency data […]
HPE ProLiant Compute AI Portfolio and Solutions
The HPE ProLiant team presents their AI-ready server portfolio: PCAI, DL145, DL380a, and DL384. The team also discusses computer vision use cases with customers and considers compute as a foundation for AI. Presented by Scott Shaffer, CTO, HPE Compute, and Vaibhav Rastogi, Compute Solutions Manager. The presentation begins by emphasizing real-world enterprise applications of AI […]
HPE ProLiant Compute Cooling Technologies
The HPE ProLiant liquid cooling team presents a “show and tell” session focused on cooling innovation, direct liquid cooling (DLC) and closed-loop liquid cooling (CLLC). Presented by Pranay Mahendra, Mechanical and Thermal Engineer, and Keith Sauer, Mechanical Engineering Manager. During the presentation, Keith Sauer first explains the integral work of the Houston-based engineering team responsible […]
HPE Server Management with Compute Ops Management and iLO
The HPE ProLiant platform includes industry-leading management capabilities in iLO, including OneView and COM. This session features demonstrations of these features and functions. Presented by Andrew Elisavetsky, Technical Enablement, Chris Bradley, Technical Enablement Manager, and Chris Powell, Technical Enablement. Chris Powell and Andrew Elisavetsky began the session by demonstrating the new capabilities of HPE’s iLO […]
End-to-End Server Security with HPE iLO 7
From chip to cloud, HPE ProLiant iLO 7 features many security innovations. Presented by Cole Humphreys, Server Security Product Manager, and Luis Luciano, Distinguished Technologist. During this deep dive session, HPE outlined its comprehensive security approach to server infrastructure, emphasizing that cybersecurity threats are pervasive and increasingly targeting hardware vulnerabilities. HPE identified rising ransomware threats, […]