Optimizing Storage for AI Workloads with Solidigm

Event: AI Data Infrastructure Field Day 1

Appearance: Solidigm Presents at AI Data Infrastructure Field Day 1

Company: Solidigm

Video Links:

Personnel: Ace Stryker

In this presentation, Ace Stryker from Solidigm discusses the company’s unique value proposition in the AI data infrastructure market, focusing on their high-density QLC SSDs and the recently announced Gen 5 TLC SSDs. He emphasizes the importance of selecting the right storage architecture for different phases of the AI pipeline, from data ingestion to archiving. Solidigm’s QLC SSDs, with their high density and power efficiency, are recommended for the beginning and end of the pipeline, where large volumes of unstructured data are handled. For the middle stages, where performance is critical, Solidigm offers the D7 PS1010 Gen 5 TLC SSD, which boasts impressive sequential and random read performance, making it ideal for keeping GPUs maximally utilized.

The presentation highlights the flexibility of Solidigm’s product portfolio, which allows customers to optimize for various goals, whether it’s power efficiency, GPU utilization, or overall performance. The Gen 5 TLC SSD, the D7 PS1010, is positioned as the performance leader, capable of delivering 14.5 gigabytes per second sequential read speeds. Additionally, Solidigm offers other options like the 5520 and 5430 drives, catering to different performance and endurance needs. The discussion also touches on the efficiency of these drives, with Solidigm’s products outperforming competitors in various AI workloads, as demonstrated by the ML Commons ML Perf Storage Benchmark results.

A notable case study presented is the collaboration with the Zoological Society of London to conserve urban hedgehogs. Solidigm’s high-density QLC SSDs are used in an edge data center at the zoo, enabling efficient processing and analysis of millions of images captured by motion-activated cameras. This setup allows the organization to assess hedgehog populations and make informed conservation decisions. The presentation concludes by emphasizing the importance of efficient data infrastructure in AI applications and Solidigm’s commitment to delivering high-density, power-efficient storage solutions that meet the evolving needs of AI workloads.


The Energy Crunch Is Not a Future Problem with Solidigm

Event: AI Data Infrastructure Field Day 1

Appearance: Solidigm Presents at AI Data Infrastructure Field Day 1

Company: Solidigm

Video Links:

Personnel: Manzur Rahman

In the presentation by Solidigm at AI Data Infrastructure Field Day 1, Manzur Rahman emphasized the critical issue of energy consumption in AI and data infrastructure. He referenced quotes from industry leaders like Sam Altman and Mark Zuckerberg, highlighting the significant challenge energy poses in scaling AI operations. Rahman discussed findings from white papers by Meta and Microsoft Azure, which revealed that a substantial portion of energy consumption in data centers is attributed to hard disk drives (HDDs). Specifically, Meta’s AI recommendation engine and Microsoft’s cloud services found that HDDs consumed 35% and 33% of total operational energy, respectively. This underscores the need for more energy-efficient storage solutions to manage the growing data demands.

Rahman then explored various use cases and the increasing need for network-attached storage (NAS) in AI applications. He noted that data is growing exponentially, with different modalities like text, audio, and video contributing to the data deluge. For instance, hyper-scale large language models (LLMs) and large video models (LVMs) require massive amounts of storage, ranging from 1.3 petabytes to 32 petabytes per GPU rack. The trend towards synthetic data and data repatriation is further driving the demand for NAS. Solidigm’s model for a 50-megawatt data center demonstrated that using QLC (Quad-Level Cell) storage instead of traditional HDDs and TLC (Triple-Level Cell) storage could significantly reduce energy consumption and increase the number of GPU racks that can be supported.

The presentation concluded with a comparison of different storage configurations, showing that QLC storage offers substantial energy savings and space efficiency. For example, a DGX H100 rack with QLC storage consumed only 6.9 kilowatts compared to 32 kilowatts for a setup with TLC and HDDs. This translates to 4x fewer storage racks, 80% less storage power, and 50% more DGX plus NAS racks in a 50-megawatt data center. Rahman also addressed concerns about heat generation and longevity, noting that while QLC may generate more heat and have fewer P/E cycles compared to TLC, the overall energy efficiency and performance benefits make it a viable solution for modern data centers. Solidigm’s high-density drives, such as the P-5520 and QLCP-P-5430, were highlighted as effective in reducing rack space and power consumption, further supporting the case for transitioning to more energy-efficient storage technologies.


Optimizing Data Center TCO An In Depth Analysis and Sensitivity Study with Solidigm

Event: AI Data Infrastructure Field Day 1

Appearance: Solidigm Presents at AI Data Infrastructure Field Day 1

Company: Solidigm

Video Links:

Personnel: Manzur Rahman

Manzur Rahman from Solidigm presented an in-depth analysis of Total Cost of Ownership (TCO) for data centers, emphasizing its growing importance in the AI era. TCO encompasses acquisition, operation, and maintenance costs, and is crucial for evaluating cost-effective, high-performance hardware like GPUs, storage, and AI chips. Rahman highlighted the need for energy-efficient solutions and the importance of right-sizing storage to avoid over or under-provisioning. He explained that TCO includes both direct costs (materials, labor, energy) and indirect costs (overheads, cooling, carbon tax), and uses a normalization method to provide a comprehensive cost per terabyte effective per month per rack.

Rahman detailed Solidigm’s TCO model, which incorporates dynamic variables such as hardware configuration, drive replacement cycles, and workload mixes. The model also factors in the time value of money, maintenance, disposal costs, and greenhouse gas taxes. By comparing HDD and SSD racks under various scenarios, Solidigm found that SSDs can offer significant TCO benefits, especially when variables like replacement cycles, capacity utilization, and data compression are optimized. For instance, extending the SSD replacement cycle from five to seven years can improve TCO by 22%, and increasing capacity utilization can lead to a 67% improvement.

The presentation concluded with a sensitivity analysis showing that high-density QLC SSDs can significantly reduce TCO compared to HDDs. Even with higher upfront costs, the overall TCO is lower due to better performance, longer replacement cycles, and higher capacity utilization. Rahman projected that high-density QLC SSDs will continue to offer TCO improvements in the coming years, making them a promising solution for data centers, particularly in AI environments. The analysis demonstrated that while CAPEX for SSDs is higher, the overall cost per terabyte effective is lower, making SSDs a cost-effective choice for future data center deployments.


How Data Infrastructure Improves or Impedes Al Value Creation with Solidigm

Event: AI Data Infrastructure Field Day 1

Appearance: Solidigm Presents at AI Data Infrastructure Field Day 1

Company: Solidigm

Video Links:

Personnel: Ace Stryker

Ace Stryker from Solidigm presented on the critical role of data infrastructure in AI value creation, emphasizing the importance of quality and quantity in training data. He illustrated this with an AI-generated image of a hand with an incorrect number of fingers, highlighting the limitations of AI models that lack intrinsic understanding of the objects they depict. This example underscored the necessity for high-quality training data to improve AI model outputs. Stryker explained that AI models predict desired outputs based on training data, which often lacks comprehensive information about the objects, leading to errors. He stressed that these challenges are not unique to image generation but are prevalent across various AI applications, where data variety, low error margins, and limited training data pose significant hurdles.

Stryker outlined the AI data pipeline, breaking it down into five stages: data ingestion, data preparation, model development, inference, and archiving. He detailed the specific data and performance requirements at each stage, noting that data magnitude decreases as it moves through the pipeline, while the type of I/O operations varies. For instance, data ingestion involves large sequential writes to object storage, while model training requires random reads from high-performance storage. He also discussed the importance of checkpointing during model training to prevent data loss and ensure efficient recovery. Stryker highlighted the growing trend of distributing AI workloads across core data centers, regional data centers, and edge servers, driven by the need for faster processing, data security, and reduced data transfer costs.

The presentation also addressed the challenges and opportunities of deploying AI at the edge. Stryker noted that edge environments often have lower power budgets, space constraints, and higher serviceability requirements compared to core data centers. He provided examples of edge deployments, such as medical imaging in hospitals and autonomous driving solutions, where high-density storage solutions like QLC SSDs are used to enhance data collection and processing. Stryker emphasized the need for storage vendors to adapt to these evolving requirements, ensuring that their products can meet the demands of both core and edge AI applications. The session concluded with a discussion on Solidigm’s product portfolio and how their SSDs are designed to optimize performance, energy efficiency, and cost in AI deployments.


Next Gen Data Protection and Recovery with Infinidat

Event: AI Data Infrastructure Field Day 1

Appearance: Infinidat Presents at AI Data Infrastructure Field Day 1

Company: INFINIDAT

Video Links:

Personnel: Bill Basinas

Infinidat’s presentation on next-generation data protection and recovery emphasizes the critical need for robust cyber-focused strategies to safeguard corporate infrastructure and critical data assets. Bill Basinas, the Senior Director of Product Marketing, highlights the importance of moving beyond traditional backup and recovery methods to a more proactive approach that prioritizes business recovery. Infinidat’s solutions are designed to protect data efficiently and ensure its availability, leveraging advanced technologies like immutable snapshots and automated cyber protection to provide a resilient and secure storage environment.

The core of Infinidat’s approach lies in its InfiniSafe technology, which offers immutable snapshots, logical air gapping, and instant recovery capabilities. These features ensure that data remains protected and can be quickly restored in the event of a cyber attack. The immutable snapshots are particularly crucial as they cannot be altered or deleted until their expiration, providing a reliable safeguard against data tampering. Additionally, InfiniSafe’s automated cyber protection (ACP) integrates seamlessly with existing security infrastructures, enabling real-time responses to potential threats and ensuring that data is continuously validated and verified.

Infinidat also collaborates with Index Engines to enhance its cyber detection capabilities. This partnership allows Infinidat to offer advanced content-level scanning and pattern matching to detect anomalies and potential threats with high accuracy. The integration of these technologies ensures that any compromised data is quickly identified and isolated, minimizing the impact of cyber attacks. By focusing on recovery first and ensuring that validated data is readily available, Infinidat provides a comprehensive solution that addresses the evolving challenges of data protection in today’s cyber-centric landscape.


AI Workloads and Infinidat

Event: AI Data Infrastructure Field Day 1

Appearance: Infinidat Presents at AI Data Infrastructure Field Day 1

Company: INFINIDAT

Video Links:

Personnel: Bill Basinas

Infinidat’s presentation at AI Data Infrastructure Field Day 1, led by Bill Basinas, focused on the company’s strategic positioning within the AI infrastructure market. Basinas emphasized that Infinidat has been closely monitoring the AI landscape for over a year to identify where their enterprise storage solutions can best serve AI workloads. He acknowledged the rapid growth and evolving nature of the AI market, particularly in generative AI (Gen AI) and its associated learning models. Infinidat aims to provide robust storage solutions that enhance the accuracy of AI-generated results, especially through their focus on Retrieval Augmented Generation (RAG). This approach is designed to mitigate issues like data inaccuracies and hallucinations in AI outputs by leveraging Infinidat’s existing data center capabilities.

Basinas highlighted that Infinidat’s strength lies in its ability to support mission-critical applications and workloads, including databases, ERP systems, and virtual infrastructures. The company is now extending this expertise to AI workloads by ensuring high performance and reliability. Infinidat’s InfiniSafe technology offers industry-leading cyber resilience, and their “white glove” customer service ensures a seamless integration of their storage solutions into existing infrastructures. The company is not currently involved in data classification or governance but focuses on providing the underlying storage infrastructure that supports AI applications. This strategic choice allows Infinidat to concentrate on their core competencies while potentially partnering with other vendors for data management and security.

Infinidat’s approach to AI infrastructure is pragmatic and customer-centric. They are working closely with clients to understand their needs and are developing workflows and reference architectures to facilitate the deployment of RAG-based infrastructures. The company is also exploring the integration of vector databases and other advanced data management technologies to further enhance their AI capabilities. While Infinidat is not yet offering object storage natively, they are actively working on it and have partnerships with companies like MinIO and Hammerspace to provide interim solutions. Overall, Infinidat aims to leverage its existing infrastructure to support AI workloads effectively, offering a cost-effective and scalable solution for enterprises venturing into AI.


Infinidat InfuzeOS for Hybrid Multi Cloud

Event: AI Data Infrastructure Field Day 1

Appearance: Infinidat Presents at AI Data Infrastructure Field Day 1

Company: INFINIDAT

Video Links:

Personnel: Bill Basinas

Infinidat’s InfuzeOS is a versatile operating system designed to support both on-premises and hybrid multi-cloud environments. Initially developed for on-premises solutions, InfuzeOS has been extended to work seamlessly with major cloud providers like AWS and Microsoft Azure. This extension allows customers to experience the same ease of use and robust functionality in the cloud as they do with their on-premises systems. InfuzeOS retains all its core features, including neural cache, InfiniRAID, InfiniSafe, and InfiniOps, ensuring consistent performance and management across different environments. The system supports both block and file storage, making it adaptable to various workload needs.

InfiniRAID, a key component of Infinidat’s technology, is a software-based RAID architecture that provides exceptional resilience and performance. Unlike traditional RAID systems that can be vulnerable to multiple drive failures, InfiniRAID can handle dozens of device failures without compromising system operations. This high level of resilience is achieved through a patented approach that manages RAID at the software layer, allowing for efficient use of all available devices. This capability is particularly beneficial for enterprise environments where data integrity and uptime are critical. InfiniRAID’s design also simplifies maintenance, as customers can replace failed drives without immediate technical intervention, thanks to the system’s hot-pluggable drives and proactive monitoring.

InfiniSafe, another integral feature, focuses on cyber resilience, providing robust protection against data breaches and cyber threats. While the cloud implementations of InfuzeOS do not offer the same level of hardware control and guarantees as on-premises solutions, they still deliver significant benefits. The cloud version currently operates on a single compute instance, with plans to evolve towards multi-node configurations to better support complex workloads. Despite these differences, the cloud implementation maintains the same software functionality, including compression and support for Ethernet-based protocols. This makes InfuzeOS a flexible and powerful solution for various use cases, from functional test development to backup and replication targets, and increasingly, AI workloads.


Infinidat Products – InfiniBox G4 and InfiniBox SSA

Event: AI Data Infrastructure Field Day 1

Appearance: Infinidat Presents at AI Data Infrastructure Field Day 1

Company: INFINIDAT

Video Links:

Personnel: Bill Basinas

Infinidat’s presentation at AI Data Infrastructure Field Day 1, led by Bill Basinas, Sr. Director of Product Marketing, focused on their latest storage solutions, the InfiniBox G4 and InfiniBox SSA. Basinas highlighted the unique software-defined architecture of Infinidat’s products, which leverage commodity hardware components to deliver high performance and resilience. The core of their technology, InfuseOS, provides triple redundancy and integrates features like neural cache, which uses a large shared pool of DRAM memory to ensure high performance and cache hit rates. This architecture allows Infinidat to guarantee 100% availability and high performance, making their solutions suitable for mission-critical applications and workloads, including AI tasks like Retrieval Augmented Generation (RAG).

The presentation also detailed the hardware advancements in the G4 series, which transitioned from Intel to AMD EPYC CPUs, resulting in a 31% increase in core count and a 20% improvement in power efficiency. This shift to PCIe Gen 5 and DDR5 DRAM further enhanced the system’s performance. Infinidat’s approach to scalability was also discussed, with the introduction of smaller configurations like the 1400T series, designed for remote data centers and edge deployments. These systems maintain the same enterprise-class architecture and performance, allowing for seamless, non-disruptive scaling. Additionally, Infinidat continues to offer hybrid storage solutions, which are particularly popular for large-scale backup environments.

Infinidat’s InfiniVerse platform was another key topic, providing extensive telemetry and predictive analytics to help customers manage their storage environments. InfiniVerse leverages AI and machine learning to offer insights into storage usage, power consumption, and carbon footprint, aligning with modern ESG goals. The platform supports Infinidat’s white-glove customer service, where technical advisors work closely with clients to optimize their storage infrastructure. The presentation concluded with a discussion on Infinidat’s future plans, including the Mobius controller upgrade program, which promises seamless, in-place upgrades without disrupting system availability, further emphasizing Infinidat’s commitment to innovation and customer satisfaction.


Revolutionizing Enterprise Storage with High Performance and Unmatched Customer Support from Infinidat

Event: AI Data Infrastructure Field Day 1

Appearance: Infinidat Presents at AI Data Infrastructure Field Day 1

Company: INFINIDAT

Video Links:

Personnel: Bill Basinas

Infinidat, a company founded in 2011 with deep roots in enterprise storage, has established itself as a critical player in the IT storage infrastructure for some of the world’s largest companies. The company’s founder, who previously created EMC Symmetrix and sold XIV to IBM, has leveraged this extensive experience to build Infinidat into a provider of high-performance, high-availability storage solutions. These solutions are designed to handle mission-critical applications and workloads, including databases, ERP systems, and virtual infrastructures. Recently, Infinidat has also expanded its focus to support critical AI workloads, such as Retrieval Augmented Generation (RAG), reflecting the evolving needs of modern enterprises.

Infinidat’s approach to the market is somewhat unique, as it traditionally sold full rack-scale systems but is now offering more flexible consumption options. This flexibility allows customers to integrate Infinidat’s solutions seamlessly into their existing infrastructure without significant changes. The company’s customer base primarily consists of large enterprises, with an average customer managing 17 petabytes of storage. Infinidat’s systems are known for their scalability, capable of handling up to 17 petabytes of effective data, and are designed to provide enterprise-caliber performance, 100% availability resilience, and industry-leading cyber resilience through their InfiniSafe technology.

One of the standout features of Infinidat’s offerings is their “white glove” customer service, which includes comprehensive support from the initial purchase through the entire lifecycle of the product at no additional cost. This service model has flipped the traditional support paradigm on its head, providing customers with technical advisors and eliminating the need for additional support tiers or add-ons. Infinidat’s solutions also contribute to data center consolidation, offering greener, more efficient storage options that reduce floor space, power, cooling, and operational costs. The company’s recent product developments, including the AI-driven Infiniverse platform and the cloud editions of their Fuse OS operating system, further demonstrate their commitment to innovation and addressing the evolving needs of enterprise customers, particularly in the realm of cybersecurity and cyber attack prevention.


Streamline AI Projects with Infrastructure Abstraction from HPE

Event: AI Data Infrastructure Field Day 1

Appearance: HPE Presents at AI Data Infrastructure Field Day 1

Company: HPE

Video Links:

Personnel: Alexander Ollman

In this presentation, Alex Ollman from Hewlett Packard Enterprise (HPE) discusses the transformative potential of infrastructure abstraction in accelerating AI projects. The focus is on HPE’s Private Cloud AI, a solution designed to simplify the management of complex systems, thereby allowing data engineers, scientists, and machine learning engineers to concentrate on developing and refining AI applications. By leveraging HPE Ezmeral Software, the Private Cloud AI aims to provide a unified experience that maintains control over both the infrastructure and the associated data, ultimately fostering innovation and productivity in AI-driven projects.

Ollman emphasizes the importance of abstracting the underlying infrastructure, including GPU accelerator compute, storage for models, and high-speed networking, into a virtualized software layer. This abstraction reduces the time and effort required to manage these components directly, enabling users to focus on higher-level tasks. HPE’s GreenLake Cloud Platform plays a crucial role in this process by automating the configuration of entire racks, which can be set up with just three clicks. This ease of use is further enhanced by HPE AI Essentials, which allows for the creation and deployment of automations tailored to the unique data structures of different organizations.

The presentation also highlights HPE’s collaboration with NVIDIA to scale the development and deployment of large language models and other generative models. This partnership aims to make these advanced AI components more accessible and scalable for enterprises. HPE’s solution accelerators, part of the Private Cloud AI offering, promise to streamline the deployment of data, models, and applications with a single click. This capability is expected to be formally released by the end of the year, providing a powerful tool for enterprises to manage and scale their AI projects efficiently.


Building a Generative AI Foundation with HPE

Event: AI Data Infrastructure Field Day 1

Appearance: HPE Presents at AI Data Infrastructure Field Day 1

Company: HPE

Video Links:

Personnel: Alexander Ollman, Edward Holden

Join Hewlett Packard Enterprise’s product team for a deep dive into the AI architecture and infrastructure needed to deploy generative AI at enterprise scale. We’ll explore the essential components—from high-performance compute and storage to orchestration—that power these models. Using real-world case studies, we’ll uncover the intricacies of balancing computational resources, networking, and optimization. Discover how Hewlett Packard Enterprise simplifies this process with integrated solutions.

In the presentation, Alex Ollman and Edward Holden from HPE discuss the comprehensive infrastructure required to support generative AI at an enterprise level, focusing on both hardware and software components. They emphasize the importance of a holistic approach that integrates high-performance computing, storage, and orchestration to manage the complex workflows involved in machine learning operations. The HPE Ezmeral platform is highlighted as a key solution that abstracts the underlying infrastructure, making it easier for data scientists, engineers, and developers to focus on their specific tasks without worrying about the technical complexities of setting up and managing the infrastructure.

The presentation also delves into the roles of different personas within an organization, such as cloud administrators, AI administrators, and AI developers. Each role has specific needs and responsibilities, and HPE’s Private Cloud AI offering is designed to cater to these needs by providing a unified platform that simplifies user management, data access, and resource allocation. The platform allows for seamless integration of various tools and frameworks, such as Apache Airflow for data engineering and Jupyter Notebooks for development, all pre-configured and ready to use. This approach not only accelerates the deployment of AI models but also ensures that the infrastructure can scale efficiently to meet the demands of enterprise applications.

Furthermore, the presentation touches on the collaboration between HPE and NVIDIA to enhance the capabilities of the Private Cloud AI platform. This partnership aims to deliver scalable, enterprise-grade AI solutions that can handle large language models and other complex AI workloads. The integration of NVIDIA’s AI Enterprise stack with HPE’s infrastructure ensures that users can deploy and manage AI models at scale, leveraging the best of both companies’ technologies. The session concludes with a discussion on the support and diagnostic capabilities of the platform, ensuring that organizations can maintain and troubleshoot their AI infrastructure effectively.


A Step-by-Step Guide to Build Robust AI with Hewlett Packard Enterprise

Event: AI Data Infrastructure Field Day 1

Appearance: HPE Presents at AI Data Infrastructure Field Day 1

Company: HPE

Video Links:

Personnel: Alexander Ollman

Generative AI holds the promise of transformative advancements, but its development requires careful planning and execution. Hewlett Packard Enterprise (HPE) leverages its extensive experience to navigate the intricacies of building enterprise-grade generative AI, covering aspects from infrastructure and data management to model deployment. Alexander Ollman, a product manager at HPE, emphasizes the importance of integrating the needs of those who will use the AI infrastructure into the decision-making process, highlighting the rapid and essential demand for robust AI solutions in the enterprise sector.

Ollman provides a detailed explanation of the evolution and significance of generative AI, particularly focusing on the development of transformer models by Google in 2017, which revolutionized the field by enabling real-time generation of responses. He distinguishes between traditional AI models, which are often specific and smaller, and generative models, which are large, computationally intensive, and designed for general applications. This distinction is crucial for understanding the different infrastructure requirements for each type of AI, as generative models necessitate more substantial computational resources and sophisticated data management strategies.

The presentation underscores the complexity of deploying generative AI applications, outlining a multi-step process that includes data gathering, preparation, selection, model training, and validation. Ollman stresses the importance of automating and abstracting these steps to streamline the process and make it accessible to various personas involved in AI development, from data engineers to application developers. He also highlights the necessity of high-performance infrastructure, such as GPU-accelerated compute and fast networking, to support the large-scale models used in generative AI. By abstracting technical complexities, HPE aims to empower organizations to harness the full potential of generative AI while ensuring reliability and efficiency in their AI deployments.


Managing Google Cloud Storage at Scale with Gemini

Event: AI Data Infrastructure Field Day 1

Appearance: Google Cloud Presents at AI Data Infrastructure Field Day 1

Company: Google Cloud

Video Links:

Personnel: Manjul Sahay

In the presentation, Manjul Sahay from Google Cloud discusses the challenges and solutions for managing vast amounts of data in Google Cloud Storage, particularly for enterprises involved in data-intensive activities like autonomous driving and drug discovery. He highlights that traditional methods of data management become ineffective when dealing with billions of objects and petabytes of data. The complexity is compounded by the need for security, cost management, and operational insights, which are difficult to achieve at such a large scale. To address these challenges, Google Cloud has developed new capabilities to streamline the process, making it easier for customers to manage their data efficiently.

One of the key solutions introduced is the Insights Data Set, which aggregates metadata from billions of objects and thousands of buckets into BigQuery for analysis. This daily snapshot of metadata includes custom tags and other relevant information, allowing users to gain insights without the need for extensive manual querying and scripting. This capability is designed to be user-friendly, enabling even non-experts to perform complex data analysis with just a few clicks. By leveraging BigQuery’s powerful tools, users can generate actionable insights quickly, which is crucial for maintaining security and compliance, as well as optimizing storage usage and costs.

Additionally, Google Cloud has integrated AI capabilities through Gemini, a natural language interface that allows users to query metadata in real-time without needing specialized knowledge. This feature democratizes data management by shifting some responsibilities from storage admins to end-users, making the process more efficient and less error-prone. Gemini also provides verified answers to common questions, ensuring accuracy and reliability. The overall goal of these innovations is to help enterprises manage their data at scale, keeping it secure, compliant, and ready for AI applications, thereby enabling them to focus on their core business objectives.


Google Cloud Vertex Al & Google Cloud NetApp Volumes

Event: AI Data Infrastructure Field Day 1

Appearance: Google Cloud Presents at AI Data Infrastructure Field Day 1

Company: Google Cloud

Video Links:

Personnel: Raj Hosamani

Rajendraprasad Hosamani from Google Cloud Storage presented on the integration of Google Cloud Vertex AI with Google Cloud NetApp Volumes, emphasizing the importance of grounding AI agents in bespoke, enterprise-specific data. He highlighted that AI workloads are diverse and that agents can significantly enhance user experiences by providing interactive, personalized, and efficient data sharing. For agents to be effective, they must be grounded in the specific truths of an organization, which requires seamless data integration from various sources, whether on-premises or in the cloud. This integration also necessitates robust governance to ensure data is shared appropriately within the enterprise.

Vertex AI, Google’s flagship platform for AI app builders, offers a comprehensive suite of tools categorized into model garden, model builder, and agent builder. The model garden allows users to select from first-party, third-party, or open-source models, while the model builder focuses on creating custom models tailored to specific business needs. The agent builder facilitates the responsible and reliable creation of AI agents, incorporating capabilities like orchestration, grounding, and data extraction. This platform supports no-code, low-code, and full-code development experiences, making it accessible to a wide range of users within an organization.

The integration of NetApp Volumes with Vertex AI enables the use of NetApp’s proven OnTap storage stack as a data store within Vertex AI. This allows for the seamless incorporation of enterprise data into AI development workflows, facilitating the creation, testing, and fine-tuning of AI agents. Raj demonstrated how this integration can elevate user experiences through various implementations, such as chat agents, search agents, and recommendation agents, all of which can be developed with minimal coding. This integration empowers organizations to leverage their accumulated data to create rich, natural language-based interactions for their end users, thereby enhancing the overall value derived from their AI investments.


Google Cloud Storage for AI ML Workloads

Event: AI Data Infrastructure Field Day 1

Appearance: Google Cloud Presents at AI Data Infrastructure Field Day 1

Company: Google Cloud

Video Links:

Personnel: Dave Stiver

In his presentation on Google Cloud Storage for AI ML workloads, Dave Stiver, Group Product Manager at Google Cloud, discussed the critical role of cloud storage in the AI data pipeline, particularly focusing on training, checkpoints, and inference. He emphasized the importance of time to serve for machine learning developers, highlighting that while scalability and performance are essential, the ability to interact with object storage through a file interface is crucial for developers who are accustomed to file systems. Stiver introduced two key features, GCS FUSE and Anywhere Cache, which enhance the performance of cloud storage for AI workloads. GCS FUSE allows users to mount cloud storage buckets as local file systems, while Anywhere Cache provides a local zonal cache that significantly boosts data access speeds by caching data close to the accelerators.

Stiver shared a use case involving Woven, the autonomous driving division of Toyota, which transitioned from using Lustre to GCS FUSE for their training jobs. This shift resulted in a 50% reduction in training costs and a 14% decrease in training time, demonstrating the effectiveness of the local cache feature in GCS FUSE. He also explained the functionality of Anywhere Cache, which allows users to cache data in the same zone as their accelerators, providing high bandwidth and efficient data access. The presentation highlighted the importance of understanding the consistency model of the cache and how it interacts with the underlying storage, ensuring that users can effectively manage their data across different regions and zones.

The discussion then shifted to the introduction of Parallel Store, a fully managed parallel file system designed for high-throughput AI workloads. Stiver explained that Parallel Store is built on DAOS technology and targets users who require extremely high performance for their AI training jobs. He emphasized the importance of integrating storage solutions with cloud storage to optimize costs and performance, particularly for organizations that need to manage large datasets across hybrid environments. The presentation concluded with a focus on the evolving landscape of AI workloads and the need for tailored storage solutions that can adapt to the diverse requirements of different applications and user personas within organizations.


Workload and AI-Optimized Infrastructure from Google Cloud

Event: AI Data Infrastructure Field Day 1

Appearance: Google Cloud Presents at AI Data Infrastructure Field Day 1

Company: Google Cloud

Video Links:

Personnel: Sean Derrington

Sean Derrington from Google Cloud’s storage group presented on the company’s efforts to optimize AI and workload infrastructure, focusing on the needs of large-scale customers. Google Cloud has been working on a comprehensive system, referred to as the AI hypercomputer, which integrates hardware and software to help customers efficiently manage their AI tasks. The hardware layer includes a broad portfolio of accelerators like GPUs and TPUs, tailored for different workloads. The network capabilities of Google Cloud ensure predictable and consistent performance globally. Additionally, Google Cloud offers various framework packages and managed services like Vertex AI, which supports different AI activities, from building and training models to serving them.

Derrington highlighted the recent release of Parallel Store, Google Cloud’s first managed parallel file system, and Hyperdisk ML, a read-only block storage service. These new storage solutions are designed to handle the specific demands of AI workloads, such as training, checkpointing, and serving. Parallel Store, for instance, is built on local SSDs and is suitable for scratch storage, while Hyperdisk ML allows multiple hosts to access the same data, making it ideal for AI applications. The presentation also touched on the importance of selecting the right storage solution based on the size and nature of the training data set, checkpointing needs, and serving requirements. Google Cloud’s open ecosystem, including partnerships with companies like SciCom, offers additional storage options like GPFS-based solutions.

The presentation emphasized the need for customers to carefully consider their storage requirements, especially as they scale their AI operations. Different storage solutions are suitable for different scales of operations, from small-scale jobs requiring low latency to large-scale, high-throughput needs. Google Cloud aims to provide consistent and flexible storage solutions that can seamlessly transition from on-premises to cloud environments. The goal is to simplify the decision-making process for customers and ensure they have access to the necessary resources, such as H100s, which might not be available on-premises. The session concluded with a promise to delve deeper into the specifics of Parallel Store and other storage solutions, highlighting their unique capabilities and use cases.


A Demonstration of the MinIO Enterprise Object Store – The AI-Centric Feature Set

Event: AI Data Infrastructure Field Day 1

Appearance: MinIO Presents at AI Data Infrastructure Field Day 1

Company: MinIO

Video Links:

Personnel: Dil Radhakrishnan, Jonathan Symonds

MinIO’s AI feature set is expansive, but there are core features that allow enterprises to operate at exascale. Those include observability, security, performance, search and manageability. In this segment, MinIO goes from bucket creation to RAG-deployment, emphasizing each core AI feature and why it matters to enterprises with data scale challenges that run from PBs to EBs and beyond.

MinIO’s presentation at the AI Data Infrastructure Field Day 1 focused on demonstrating the capabilities of their enterprise object store, particularly its AI-centric features. The core features highlighted include observability, security, performance, search, and manageability, which are essential for enterprises operating at exascale. The presentation began with an overview of the global console, which allows for the management of multiple sites across different cloud environments, both public and private. This console integrates key management systems for object-level encryption, providing granular security that is crucial for large-scale data operations.

The demonstration showcased how MinIO handles various AI and ML workloads, emphasizing the importance of data preprocessing and transformation in data lakes. The observability feature was particularly highlighted, showing how MinIO’s system can monitor and manage the health of the cluster, including drive metrics, CPU usage, and network health. This observability is crucial for maintaining performance and preemptively addressing potential issues. The presentation also covered the built-in load balancer, which ensures even distribution of workloads across nodes, and the in-memory caching system that significantly boosts performance by reducing data retrieval times.

Additionally, the presentation touched on the catalog feature, which allows for efficient searching and managing of metadata within massive namespaces. This feature is particularly useful for identifying and addressing issues such as excessive requests from buggy code. The session concluded with a discussion on the integration of MinIO with AI/ML workflows, including the use of Hugging Face for model training and the implementation of RAG (Retrieval-Augmented Generation) systems. This integration ensures that enterprises can seamlessly manage and scale their AI/ML operations, leveraging MinIO’s robust and scalable object storage solutions.


Why MinIO is Winning the Private Cloud AI Battle

Event: AI Data Infrastructure Field Day 1

Appearance: MinIO Presents at AI Data Infrastructure Field Day 1

Company: MinIO

Video Links:

Personnel: Rakshith Venkatesh

Many of the largest private cloud AI deployments run on MinIO and most of the AI ecosystem is integrated or built on MinIO, from Anyscale to Zilliz. In this segment, MinIO explains the features and capabilities that make it the leader in high-performance storage for AI. Those include customer case studies, the DataPod reference architecture and the features that AI-centric enterprises deem requirements.

MinIO has established itself as a leader in high-performance storage for AI, particularly in private cloud environments. The company’s software-defined, cloud-native object store is designed to handle exabyte-scale deployments, making it a preferred choice for large AI ecosystems. MinIO’s S3-compatible object store is highly integrated with various AI tools and platforms, from Anyscale to Zilliz, which has contributed to its widespread adoption. The company emphasizes its ease of integration, flexibility with hardware, and robust performance, which are critical for AI-centric enterprises. MinIO’s architecture allows customers to bring their own hardware, supporting a range of chipsets and networking configurations, and is optimized for NVMe drives to ensure high throughput and performance.

A notable case study highlighted in the presentation involved a customer needing to deploy a 100-petabyte cluster over a weekend. MinIO’s solution, which does not require a separate metadata database and offers a complete object store solution rather than a gateway, was able to meet the customer’s needs efficiently. The deployment showcased MinIO’s ability to scale quickly and handle large volumes of data with high performance, achieving 2.2 terabytes per second throughput in benchmarking tests. This performance was achieved using commodity off-the-shelf hardware, demonstrating MinIO’s capability to deliver enterprise-grade storage solutions without the need for specialized equipment.

MinIO also addresses operational challenges through features like erasure coding, Bitrot protection, and a Kubernetes-native operator for seamless integration with cloud-native environments. The company provides observability tools to monitor the health and performance of the storage infrastructure, ensuring data integrity and efficient resource utilization. MinIO’s reference architecture, DataPod, offers a blueprint for deploying large-scale AI data infrastructure, guiding customers on hardware selection, networking configurations, and scalability. This comprehensive approach, combined with MinIO’s strong performance and ease of use, positions it as a leading choice for enterprises looking to build robust AI data infrastructures.


Why AI is All About Object Storage with MinIO

Event: AI Data Infrastructure Field Day 1

Appearance: MinIO Presents at AI Data Infrastructure Field Day 1

Company: MinIO

Video Links:

Personnel: Jonathan Symonds

Almost every major LLM is trained on an object store. Why is that? The answer lies in the unique properties of a modern object store – performance (throughput and IOPS), scale and simplicity. In this segment, MinIO details how AI scale is stressing traditional technologies and why object storage is the de facto storage standard for modern AI architectures.

Jonathan Symonds kicks off the presentation by MinIO at AI Data Infrastructure Field Day 1, describing the critical role of object storage in the realm of artificial intelligence (AI). Symonds begins by highlighting the unprecedented scale of data involved in AI, where petabytes have become the new terabytes, and the industry is rapidly approaching exabyte-scale challenges. Traditional storage technologies like NFS are struggling to keep up with this scale, leading to a shift towards object storage, which offers the necessary performance, scalability, and simplicity. Symonds emphasizes that the distributed nature of data creation, encompassing various formats such as video, audio, and log files, further necessitates the adoption of object storage to handle the massive and diverse data volumes efficiently.

Symonds also addresses the economic and operational considerations driving the adoption of object storage in AI. Enterprises are increasingly repatriating data from public clouds to private clouds to achieve better cost control and economic viability. This shift is facilitated by the cloud operating model, which includes containerization, orchestration, and APIs, making it easier to manage large-scale data infrastructures. The presentation underscores the importance of control over data, with Symonds citing industry leaders who advocate for keeping data within the organization’s control to maintain competitive advantage. This control is crucial for enterprises to maximize the value of their data and protect it from external threats.

The presentation concludes by discussing the unique features of object storage that make it ideal for AI workloads. These include the simplicity of the S3 API, fine-grained security controls, immutability, continuous data protection, and active-active replication for high availability. Symonds highlights that these features are essential for managing the performance and scale required by modern AI applications. He also notes that the simplicity of object storage scales operationally, technically, and economically, making it a robust solution for the growing demands of AI. The presentation reinforces the idea that object storage is not just a viable option but a necessary one for enterprises looking to harness the full potential of AI at scale.


Rethinking Data Center Infrastructure Automation with Nokia – Operations Made Easy

Event: Networking Field Day Exclusive with Nokia

Appearance: Rethinking Data Center Infrastructure Automation

Company: Nokia

Video Links:

Personnel: Bruce Wallis

Bruce Wallis, Senior PLM on the Event Driven Automation (EDA) solution shifts gears to explain the operational capabilities of the data center fabric with a focus on the state of the network and the way that EDA abstractions reduce the complexity of the day to day tasks for data center operations teams.