Watch on YouTube
Watch on Vimeo
Storage Density for AI Without Compromise with Solidigm. Allyn Malventano, AI/SSD Technologist at Solidigm, introduced his role in AI workload categorization and SSD optimization, then outlined the presentation, which would cover an overview of storage applications for AI, a cluster-scale exercise with MinIO, and a brief mention of MemKV and Metrim/RAG work. He highlighted how a traditional storage diagram from only a couple of years ago surprisingly omitted any mention of KV caching, illustrating the rapid evolution of AI infrastructure, in which KV caching has become crucial for accelerating inference as AI adoption spreads widely. Solidigm positions its high-capacity solutions in the lower, denser tiers of the AI storage pyramid, which are becoming increasingly vital as higher-tier resources like HBM and DRAM face severe constraints and rising prices.
Solidigm embarked on a significant cluster-scale exercise with MinIO, featuring an 8×8 setup comprising eight high-performance client systems initiating workloads and eight server systems fully populated with Solidigm’s 122 TB drives, totaling an impressive 24 petabytes of storage. Using simulated GPU workloads, the MinIO benchmarking tool accurately emulated storage initiation. The cluster leveraged 400 Gigabit Ethernet NICs, achieving over 250 gigabytes per second over TCP, a figure Malventano found extreme given TCP’s inherent overhead. This baseline performance scaled almost linearly up to 8 nodes and resulted from extensive tuning across the network stack, including switch and NIC buffer adjustments based on precise cable lengths and MinIO’s parity layout. This meticulous optimization led to a threefold increase in performance over the initial setup.
Looking ahead, Solidigm aims to explore further performance enhancements. Potential next steps include implementing dual-pathing to double initiator bandwidth, which is especially relevant for future integration with compute nodes such as the NVIDIA B200 or B300. A more significant avenue is to leverage RDMA (Remote Direct Memory Access) to bypass CPU overhead during data transfers, enabling direct memory-to-memory communication between network adapters. While the current exercise simulated GPU memory with DRAM, RDMA would still significantly reduce CPU bottlenecks, enabling even higher throughput. The combination of dual pathing and RDMA presents a complex yet promising approach to continually pushing the boundaries of storage performance for demanding AI workloads, despite the considerable tuning challenges involved.
Personnel: Allyn Malventano
Thank you for being part of the Tech Field Day community! Our mailing list is a great way to stay up to date on our events and technical content, and we appreciate your signup.
We promise that we’ll never spam you, send ads, or sell your information. This list will only be used to communicate with our community about our events and content. And we’ll limit it to no more than one message per week.
Although we only need your email address, it would be nice if you provided a little more information to help us get to know you better!