|
This Showcase was published on March 12, 2024.
Presenters
Delegate Panel
This Tech Field Day Showcase by VAST Data and partners covers their approach to operationalizing AI at scale, including integration with NVIDIA BlueField-3 DPUs, optimization for Supermicro hyperscale servers, and running full-stack AI operations with Run:ai.
Operationalizing AI at Scale with VAST Data
John Mao of VAST Data introduces the company, highlighting its focus on AI workloads and its growth since its founding in 2016. VAST Data has achieved significant success, including raising a Series C round at a $9.1 billion valuation, doubling and tripling software sales annually, and maintaining cash flow positivity. The company has deployed 10 exabytes of data globally, with a 60% business focus on HPC and AI workloads.
Mao announces partnerships with cloud service providers specializing in AI workloads, such as Lambda, Core42, and Genesis Cloud, and mentions enterprise customers like Zoom and Pixar using VAST for AI/ML workloads. He shares the company’s origins, founded on the vision of creating a data center scale computer or “thinking machine,” and outlines the roadmap from storage systems to data management capabilities and transactional storage systems.
VAST Data’s architecture, the disaggregated shared everything model, separates logic from storage, allowing for scalability, reliability, and economic efficiency. This architecture underpins the company’s new capabilities, including a SQL-compliant, transactional, and scalable database called VAST Database, and the VAST Data Engine for event-driven processing. Mao discusses the integration of Apache Spark and Kafka into their platform, enabling containerized compute engines alongside storage. This approach allows for complex data workflows, such as triggering functions upon data ingestion for processing and metadata generation, aiming to provide a comprehensive data platform that extends beyond traditional storage solutions.
Personnel: John Mao, Neeloy Bhattacharyya
Running VAST Data End to End on NVIDIA BlueField DPUs
John Mao of VAST Data and John Kim of NVIDIA discuss their collaboration to integrate the data platform solution with NVIDIA’s BlueField Data Processing Units (DPUs). They discuss the benefits and technical aspects of using DPUs in data centers, emphasizing the acceleration and offloading of infrastructure tasks such as networking, storage, and security.
The conversation also highlights the implementation of the data platform leveraging BlueField-3 DPUs, showcasing how it brings storage and data closer to the compute layer, improving efficiency, security, and quality of service in large AI infrastructure deployments. They touch on the potential for power savings, the integration with NVIDIA’s software framework DOCA for block storage services, and the broader implications for service providers and enterprises.
VAST Data DASE Architecture Optimized for Supermicro Hyperscale
John Mao from VAST Data and Lawrence Lam from Supermicro discuss their companies’ strategic partnership, focusing on AI and storage solutions. They highlight Supermicro’s significant presence in AI and its technological leadership and the company’s growth overall, particularly in AI and data centers. Supermicro’s early investment in GPU technology, despite AI and machine learning not being popular at the time, positioned the company ahead of its competitors. The partnership announcement includes the development of a joint solution combining VAST Data’s platform software with optimized Supermicro infrastructure, and a full-stack AI reference design and architecture to be revealed at GTC in March.
The conversation also touches on the importance of fast storage models to feed data to accelerators for AI training and the shift towards liquid cooling in data centers to overcome the physical limits of cooling GPUs, aiming for higher utilization ratios. Mao explains how VAST Data’s software-defined solution is being adapted for hyperscale optimization, catering to web scale and hyperscale customers with considerations for serviceability and scalability. The partnership aims to leverage Supermicro’s hardware innovations with VAST Data’s software to address the evolving needs of AI infrastructure, indicating a future where GPU servers might incorporate Bluefield optimization for enhanced performance.
Personnel: John Mao, Lawrence Lam
Running Full Stack AI Operations at Scale with VAST Data and Run:ai
In this discussion, Neeloy Bhattacharyya from VAST Data and Sandeep Brahmarouthu from Run:ai explore the complexities of deploying AI for high-value use cases at scale, focusing on the movement and management of data throughout the AI pipeline. They identify a common challenge in organizations where the processes of data preparation and model training and inference are often separated, leading to inefficiencies. They emphasize the importance of understanding data provenance and lineage to leverage AI effectively, especially for innovative use cases.
VAST Data’s approach involves simplifying the AI data pipeline by integrating data capture, preparation, training, and model serving processes more closely, highlighting the inefficiencies of traditional data storage and processing methods. Bhattacharyya introduces the concept of “data adjacency,” where certain functions are more efficiently run closer to where the data is stored to improve processing times and outcomes.
Brahmarouthu discusses Run:ai’s role in managing GPU resources for AI workloads, addressing the challenge of efficiently scheduling and utilizing GPUs across different teams and projects within an organization. He highlights the importance of Kubernetes in managing these resources, despite its limitations for AI-specific workloads, and how Run:ai enhances Kubernetes to better serve AI applications.
The conversation also touches on the operational challenges of deploying AI within enterprises, including the need for a DevOps model that accommodates the experimental nature of AI. They discuss the importance of infrastructure and technology partnerships, like the one between VAST Data and Run:ai, in creating efficient, scalable AI deployment strategies.
Personnel: Neeloy Bhattacharyya, Sandeep Brahmarouthu