Storage Becomes AI Memory for RAG and KV-cache with Solidigm

This video is part of the appearance, "Solidigm and MinIO present at AI Infrastructure Field Day 5". It was recorded as part of AI Infrastructure Field Day 5 at 10:30am-12:00pm on June 11, 2026.

Watch on YouTube
Watch on Vimeo

This presentation detailed Solidigm’s collaboration with Metrum to explore how storage, specifically Solidigm’s SSDs, can effectively function as AI memory for Retrieval Augmented Generation (RAG) and Key-Value (KV) cache operations. Metrum’s core challenge involved ingesting vast amounts of video, generating detailed metadata from it, and creating a quickly searchable database for AI inference. The objective was to determine if SSDs could supplant or improve upon traditional memory-based vector databases, thereby optimizing performance and reducing costs. This led to an evaluation comparing HNSW, a primarily memory-based vector database, with DiskANN, a solution explicitly optimized for storage, using Solidigm SSDs.

Metrum’s benchmarks across 1 million, 10 million, and 100 million datasets revealed compelling results. While HNSW showed a slight edge at lower concurrencies with smaller datasets, DiskANN demonstrated superior performance as concurrency increased, especially with real-world video pipeline data. This indicated that a purpose-built, storage-first approach could outperform memory-centric solutions in critical scenarios, proving that significant cost savings on DRAM could be achieved without sacrificing performance for large-scale vector databases. The ability to move these substantial datasets from expensive DRAM to more economical SSDs offers a powerful advantage for AI infrastructure.

Beyond vector database lookups, the presentation emphasized the critical role of SSDs in KV-cache offload. By caching recurrent queries on Solidigm drives, the system drastically improved the “time to first token” and eliminated redundant GPU recomputation, which otherwise wastes power and delays responses. This efficiency is crucial for serving hundreds or thousands of concurrent users, providing faster, more responsive AI interactions, and freeing up valuable GPU compute cycles for other tasks. This concept extends beyond video and applies to any large-scale AI system that benefits from efficiently persistent context and reduced re-inference, highlighting how intelligent storage integration can lead to substantial gains in both performance and operational efficiency across the entire AI pipeline.

Personnel: Allyn Malventano

AI Factories Need Private 5G From Celona

The Value of Validation with Nokia

Start with Wyebot at MFD14

AI Strategy in Chaos: Models, Infrastructure, and Neoclouds

Cisco Secures Wi-Fi 802.11bt from Post Quantum Cryptography at MFD14

HPE Juniper Host “AA” Meetings for Recovering NPS Users at MFD14

Storage Becomes AI Memory for RAG and KV-cache with Solidigm

Sign up for updates to Tech Field day events

Sign up for updates to
Tech Field day events