|
This video is part of the appearance, “MemVerge Presents at AI Field Day 6“. It was recorded as part of AI Field Day 6 at 14:00-15:30 on January 29, 2025.
Watch on YouTube
Watch on Vimeo
Steve Scargall introduces Memory Machine AI (MMAI) software from MemVerge, a platform designed to optimize GPU usage for platform engineers, data scientists, developers, MLOps engineers, decision-makers, and project leads. The software addresses challenges in strategic resource allocation, flexible GPU sharing, real-time observability and optimization, and priority management. MMAI allows users to request specific GPU types and quantities, abstracting away the complexities of underlying infrastructure. Users interact with the platform through familiar environments like VS Code and Jupyter notebooks, simplifying the process of launching and managing AI workloads.
A key feature of MMAI is its “GPU surfing” capability, which enables the dynamic movement of workloads between GPUs based on resource availability and priority. This is facilitated by MemVerge’s checkpointing technology, allowing seamless transitions without requiring users to manually manage or even be aware of the location of their computations. The platform supports both on-premises and cloud deployments, initially focusing on Kubernetes but with planned support for Slurm and other orchestration systems. This flexibility allows for integration with existing enterprise infrastructure and workflows, providing a path for organizations of various sizes and technical expertise to leverage their GPU resources more efficiently.
MMAI offers a comprehensive UI providing real-time monitoring and telemetry for both administrators and end-users. Features include departmental billing, resource sharing and bursting, and prioritized job execution. The software supports multiple GPU vendors (Nvidia initially, with AMD and Intel planned), allowing for heterogeneous environments. The presentation highlights the potential for future AI-driven scheduling and orchestration based on the rich telemetry data collected by MMAI, demonstrating a commitment to continuous improvement and optimization of GPU resource utilization in complex, multi-departmental settings. The business model is based on the number of managed GPUs.
Personnel: Steve Scargall